throttlekit
Benchmarks · ThrottleKit 1.0.0

Measured, not claimed.

Every number here was produced by the harnesses in bench/ and is reproducible on your hardware with the commands at the bottom. The in-process rows are the library's real cost; the Redis/Postgres rows are dominated by the Docker-Desktop-on-Windows round trip on this box — included for relative shape, not as absolute latency you'll see in production.

Read this first — methodology
One machine, one day. Here's exactly how.
  • Best of N independent process launches — the hot rows reproduced to ±1 ns/op across launches; the rest to a few percent.
  • Warm up, then time — every loop JIT-warms (100k sync / 10k async) before the timed window, so numbers are steady-state, not cold-start.
  • ALLOW path, single hot key — limits set high enough nothing is ever denied, isolating the algorithm's cost from deny-handling.
  • Same process, same warmup, same iteration count for every contender — and the algorithm each library actually implements is printed next to its row. A bare counter and a GCRA cell are not the same guarantee at the same ops/s.
  • Allocations are the net heap delta across the timed loop under --expose-gc, divided by iterations; 0–1 B/op means effectively allocation-free in steady state.
CPU
AMD Ryzen AI 9 HX 370 (12c / 24t)
Memory
31.1 GB
OS
Windows 11 (10.0.26200)
Node
v24.13.1
Redis
redis:7-alpine · Docker Desktop · :6380
Postgres
postgres:16 · Docker Desktop · :5433
In-process · MemoryStore
The path that runs before you ever touch a network.

A complete GCRA decision — allowed, limit, remaining, resetAt, retryAfterMs — in 169 ns, with no Promise, no microtask, and no steady-state allocation. checkSync is available whenever the store is synchronous (in-process, or a warmed two-tier lease).

Synchronous checkSync — the fast path
Strategyns/opops/secalloc
gcra1695.9M~1 B/op
tokenBucket1815.5M0 B/op
fixedWindow1935.2M0 B/op
Concurrency guards — acquire() + release()
Guardns/opops/secalloc
adaptiveConcurrency (AIMD, single process)2913.4M1 B/op
distributedAdaptiveConcurrency (post-heartbeat, local)1387.2M~0 B/op

The distributed guard's steady-state acquire is a lean local in-flight check against its leased share — the coordinator round trip happens off the request path on the heartbeat — so it's cheaper per request than the single-process AIMD controller that samples RTT inline.

Head-to-head vs. incumbents
Same process, same budget — algorithm printed per row.
Memory tier — in-process, ALLOW path
ContenderAlgorithmAPIns/opops/sec
throttlekit checkSyncGCRAsync1695.9M
throttlekit checkGCRAasync3013.3M
throttlekit checkfixed-windowasync3003.3M
rate-limiter-flexiblefixed-windowasync3313.0M
express-rate-limitfixed-window¹async1995.0M

¹ express-rate-limit's measured op is its MemoryStore.increment() — a bare counter bump; the limit decision happens later in middleware and is not included. So ThrottleKit's sync path computes a full GCRA decision in 169 ns — faster than a bare counter increment (199 ns) — and on the same async shape its GCRA (301 ns) beats rate-limiter-flexible's counter (331 ns).

Redis tier — one atomic round trip, ALLOW path
ContenderAlgorithmops/secp50p99p99.9
throttlekit RedisStoreGCRA7781.19 ms2.39 ms3.87 ms
rate-limiter-flexiblefixed-window7521.23 ms2.58 ms4.45 ms

Both do exactly one atomic Lua round trip per request and land within noise — ThrottleKit's proven GCRA transform costs nothing extra over a counter. The absolute latency is the Docker-on-Windows loopback (~1.2 ms p50), not Redis — a same-AZ managed Redis is typically 150–300 µs.

Postgres tier — single hot key, ALLOW path
ContenderAlgorithmround tripsops/secp50p99
throttlekit PostgresStoreGCRA~5 (txn)1217.9 ms12.9 ms
rate-limiter-flexiblefixed-window1 (upsert)3482.7 ms5.2 ms
throttlekit twoTier(leased)GCRA1 / 100 req12.3k81 µs

An honest split: on a single shared counter, rate-limiter-flexible's specialized one-statement UPSERT beats ThrottleKit's generic read-modify-write transaction (~5 round trips) that reuses the same proven transform across every strategy. Front that same Postgres with twoTier leasing and throughput jumps to 12.3k ops/sec — ~35× the raw path — with no equivalent in the incumbent.

The two-tier lever
An exact global limit that doesn't pay a round trip per request.
Leased vs strict — bench/run.ts --redis
Modeops/secL2 round trips
strict GCRA (1 EVALSHA / request)7831 per request
twoTier(leased), batch 10066.4k1 per 100 requests

Same correctness envelope (the lease math is machine-checked), ~85× the throughput, because 99% of requests never leave the process. Larger batches trade a looser per-node burst for fewer round trips; adaptive lease sizing tunes the batch to observed demand. How the bound holds →

Caveats · reproduce
Run them on your hardware.

Redis/Postgres absolute latency is the local Docker network, not the database — on Windows, Docker Desktop's loopback adds ~1 ms; the relative shape holds, the absolute p50 does not transfer. Sync vs async rows aren't directly comparablecheckSync has no Promise cost; the async GCRA / fixed-window rows are the fair head-to-head. Date.now() is cheaper on Linux than Windows, so in-process numbers may be a few percent faster on a Linux runner. Single hot key — multi-key effects are deliberately excluded to isolate algorithm cost.

# In-process micro-benchmarks (+ allocations, + Redis):
npm run bench
node --expose-gc --import tsx bench/run.ts --redis

# Head-to-head vs rate-limiter-flexible and express-rate-limit:
npm run bench:compare

The CI bench-regression gate (npm run bench:gate) guards these numbers on every push using a machine-independent relative metric, so a hot-path regression fails the build rather than silently shipping.

Numbers you can re-run. Bounds you can re-check.