Measured, not claimed.
Every number here was produced by the harnesses in bench/ and is reproducible on your hardware with the commands at the bottom. The in-process rows are the library's real cost; the Redis/Postgres rows are dominated by the Docker-Desktop-on-Windows round trip on this box — included for relative shape, not as absolute latency you'll see in production.
- Best of N independent process launches — the hot rows reproduced to ±1 ns/op across launches; the rest to a few percent.
- Warm up, then time — every loop JIT-warms (100k sync / 10k async) before the timed window, so numbers are steady-state, not cold-start.
- ALLOW path, single hot key — limits set high enough nothing is ever denied, isolating the algorithm's cost from deny-handling.
- Same process, same warmup, same iteration count for every contender — and the algorithm each library actually implements is printed next to its row. A bare counter and a GCRA cell are not the same guarantee at the same ops/s.
- Allocations are the net heap delta across the timed loop under
--expose-gc, divided by iterations;0–1 B/opmeans effectively allocation-free in steady state.
A complete GCRA decision — allowed, limit, remaining, resetAt, retryAfterMs — in 169 ns, with no Promise, no microtask, and no steady-state allocation. checkSync is available whenever the store is synchronous (in-process, or a warmed two-tier lease).
| Strategy | ns/op | ops/sec | alloc |
|---|---|---|---|
| gcra | 169 | 5.9M | ~1 B/op |
| tokenBucket | 181 | 5.5M | 0 B/op |
| fixedWindow | 193 | 5.2M | 0 B/op |
| Guard | ns/op | ops/sec | alloc |
|---|---|---|---|
| adaptiveConcurrency (AIMD, single process) | 291 | 3.4M | 1 B/op |
| distributedAdaptiveConcurrency (post-heartbeat, local) | 138 | 7.2M | ~0 B/op |
The distributed guard's steady-state acquire is a lean local in-flight check against its leased share — the coordinator round trip happens off the request path on the heartbeat — so it's cheaper per request than the single-process AIMD controller that samples RTT inline.
| Contender | Algorithm | API | ns/op | ops/sec |
|---|---|---|---|---|
| throttlekit checkSync | GCRA | sync | 169 | 5.9M |
| throttlekit check | GCRA | async | 301 | 3.3M |
| throttlekit check | fixed-window | async | 300 | 3.3M |
| rate-limiter-flexible | fixed-window | async | 331 | 3.0M |
| express-rate-limit | fixed-window¹ | async | 199 | 5.0M |
¹ express-rate-limit's measured op is its MemoryStore.increment() — a bare counter bump; the limit decision happens later in middleware and is not included. So ThrottleKit's sync path computes a full GCRA decision in 169 ns — faster than a bare counter increment (199 ns) — and on the same async shape its GCRA (301 ns) beats rate-limiter-flexible's counter (331 ns).
| Contender | Algorithm | ops/sec | p50 | p99 | p99.9 |
|---|---|---|---|---|---|
| throttlekit RedisStore | GCRA | 778 | 1.19 ms | 2.39 ms | 3.87 ms |
| rate-limiter-flexible | fixed-window | 752 | 1.23 ms | 2.58 ms | 4.45 ms |
Both do exactly one atomic Lua round trip per request and land within noise — ThrottleKit's proven GCRA transform costs nothing extra over a counter. The absolute latency is the Docker-on-Windows loopback (~1.2 ms p50), not Redis — a same-AZ managed Redis is typically 150–300 µs.
| Contender | Algorithm | round trips | ops/sec | p50 | p99 |
|---|---|---|---|---|---|
| throttlekit PostgresStore | GCRA | ~5 (txn) | 121 | 7.9 ms | 12.9 ms |
| rate-limiter-flexible | fixed-window | 1 (upsert) | 348 | 2.7 ms | 5.2 ms |
| throttlekit twoTier(leased) | GCRA | 1 / 100 req | 12.3k | 81 µs | — |
An honest split: on a single shared counter, rate-limiter-flexible's specialized one-statement UPSERT beats ThrottleKit's generic read-modify-write transaction (~5 round trips) that reuses the same proven transform across every strategy. Front that same Postgres with twoTier leasing and throughput jumps to 12.3k ops/sec — ~35× the raw path — with no equivalent in the incumbent.
| Mode | ops/sec | L2 round trips |
|---|---|---|
| strict GCRA (1 EVALSHA / request) | 783 | 1 per request |
| twoTier(leased), batch 100 | 66.4k | 1 per 100 requests |
Same correctness envelope (the lease math is machine-checked), ~85× the throughput, because 99% of requests never leave the process. Larger batches trade a looser per-node burst for fewer round trips; adaptive lease sizing tunes the batch to observed demand. How the bound holds →
Redis/Postgres absolute latency is the local Docker network, not the database — on Windows, Docker Desktop's loopback adds ~1 ms; the relative shape holds, the absolute p50 does not transfer. Sync vs async rows aren't directly comparable — checkSync has no Promise cost; the async GCRA / fixed-window rows are the fair head-to-head. Date.now() is cheaper on Linux than Windows, so in-process numbers may be a few percent faster on a Linux runner. Single hot key — multi-key effects are deliberately excluded to isolate algorithm cost.
# In-process micro-benchmarks (+ allocations, + Redis):
npm run bench
node --expose-gc --import tsx bench/run.ts --redis
# Head-to-head vs rate-limiter-flexible and express-rate-limit:
npm run bench:compareThe CI bench-regression gate (npm run bench:gate) guards these numbers on every push using a machine-independent relative metric, so a hot-path regression fails the build rather than silently shipping.