Benchmarks · ThrottleKit 1.6.1

Measured, not claimed.

Every number here was produced by the harnesses in bench/ and is reproducible on your hardware with the commands at the bottom. The in-process rows are the library's real cost; the Redis/Postgres rows are dominated by the Docker-Desktop-on-Windows round trip on this box, included for relative shape, not as absolute latency you'll see in production.

BENCH.md ↗ Full comparison The two-tier lever

Read this first: methodology

One machine, one day. Here's exactly how.

Best of N independent process launches: the hot rows reproduced to ±1 ns/op across launches; the rest to a few percent.
Warm up, then time: every loop JIT-warms (100k sync / 10k async) before the timed window, so numbers are steady-state, not cold-start.
ALLOW path, single hot key: limits set high enough nothing is ever denied, isolating the algorithm's cost from deny-handling.
Same process, same warmup, same iteration count for every contender. The algorithm each library actually implements is printed next to its row. A bare counter and a GCRA cell are not the same guarantee at the same ops/s.
Allocations are the net heap delta across the timed loop under --expose-gc, divided by iterations; 0–1 B/op means effectively allocation-free in steady state.

CPU

AMD Ryzen AI 9 HX 370 (12c / 24t)

Memory

31.1 GB

Windows 11 (10.0.26200)

Node

v24.13.1

Redis

redis:7-alpine · Docker Desktop · :6380

Postgres

postgres:16 · Docker Desktop · :5433

In-process · MemoryStore

The path that runs before you ever touch a network.

A complete GCRA decision (allowed, limit, remaining, resetAt, retryAfterMs) in 170 ns, with no Promise, no microtask, and no steady-state allocation. checkSync is available whenever the store is synchronous (in-process, or a warmed two-tier lease).

Synchronous checkSync: the fast path
Strategy	ns/op	ops/sec	alloc
gcra	170	5.9M	~1 B/op
tokenBucket	175	5.7M	0 B/op
fixedWindow	217	4.6M	0 B/op

Concurrency guards: acquire() + release()
Guard	ns/op	ops/sec	alloc
adaptiveConcurrency (AIMD, single process)	277	3.6M	1 B/op
distributedAdaptiveConcurrency (post-heartbeat, local)	136	7.4M	~0 B/op

The distributed guard's steady-state acquire is a lean local in-flight check against its leased share (the coordinator round trip happens off the request path on the heartbeat), so it's cheaper per request than the single-process AIMD controller that samples RTT inline.

The 2026-06-30 performance sweep

Three hot paths had no benchmark until they needed one.

Before / after (same machine, median of 3)
Path	Before	After	Δ
multiRateLimit.checkSync, 2 GCRA dims	4968 ns	2968 ns	−40%
multiRateLimit.checkSync, 3 GCRA dims	7507 ns	4599 ns	−39%
weightedFairEscrow grant, 8 tenants	182 ns	155 ns	−15%
server RateLimiter.Check handler (in-process)	898 ns	675 ns	−25%

The multi-dimension combine's win comes from skipping a structuredClone of a dimension's state when it holds an immutable primitive; only mutable object or array state still gets cloned. The weighted-fair-escrow win is an O(1) running aggregate gated on integer weights and costs, falling back to the exact rescan for fractional input, since a mutation-order float sum can drift by one ULP and flip a floor. The server Check handler's win comes from building the enforcer with emit:false, so a discarded header set is never computed. All three keep decisions byte-identical, and the gated core checkSync strategies above were deliberately left untouched (already tight).

Head-to-head vs. incumbents

Same process, same budget, algorithm printed per row.

Memory tier: in-process, ALLOW path
Contender	Algorithm	API	ns/op	ops/sec
throttlekit checkSync	GCRA	sync	165	6.1M
throttlekit check	GCRA	async	295	3.4M
throttlekit check	fixed-window	async	284	3.5M
rate-limiter-flexible	fixed-window	async	337	3.0M
express-rate-limit	fixed-window¹	async	203	4.9M

¹ express-rate-limit's measured op is its MemoryStore.increment(), a bare counter bump; the limit decision happens later in middleware and is not included. So ThrottleKit's sync path computes a full GCRA decision in 165 ns, faster than a bare counter increment (203 ns), and on the same async shape its GCRA (295 ns) beats rate-limiter-flexible's counter (337 ns).

Redis tier: one atomic round trip, ALLOW path
Contender	Algorithm	ops/sec	p50	p99	p99.9
throttlekit RedisStore	GCRA	745	1.25 ms	2.38 ms	4.81 ms
rate-limiter-flexible	fixed-window	754	1.25 ms	2.44 ms	4.70 ms

Both do exactly one atomic Lua round trip per request and land within noise. ThrottleKit's proven GCRA transform costs nothing extra over a counter. The absolute latency is the Docker-on-Windows loopback (~1.2 ms p50), not Redis: a same-AZ managed Redis is typically 150–300 µs.

Postgres tier: single hot key, ALLOW path
Contender	Algorithm	round trips	ops/sec	p50	p99
throttlekit PostgresStore	GCRA	~5 (txn)	118	8.1 ms	13.3 ms
rate-limiter-flexible	fixed-window	1 (upsert)	342	2.8 ms	5.0 ms
throttlekit twoTier(leased)	GCRA	1 / 100 req	9.4k	106 µs	—

An honest split: on a single shared counter, rate-limiter-flexible's specialized one-statement UPSERT beats ThrottleKit's generic read-modify-write transaction (~5 round trips) that reuses the same proven transform across every strategy. Front that same Postgres with twoTier leasing and throughput jumps to 9.4k ops/sec (~27× the raw path), with no equivalent in the incumbent.

The two-tier lever

An exact global limit that doesn't pay a round trip per request.

Leased vs strict: bench/run.ts --redis
Mode	ops/sec	L2 round trips
strict GCRA (1 EVALSHA / request)	735	1 per request
twoTier(leased), batch 100	71.2k	1 per 100 requests

Same correctness envelope (the lease math is machine-checked), ~97× the throughput, because 99% of requests never leave the process. Larger batches trade a looser per-node burst for fewer round trips; adaptive lease sizing tunes the batch to observed demand. How the bound holds →

Caveats · reproduce

Run them on your hardware.

Redis/Postgres absolute latency is the local Docker network, not the database: on Windows, Docker Desktop's loopback adds ~1 ms; the relative shape holds, the absolute p50 does not transfer. Sync vs async rows aren't directly comparable, since checkSync has no Promise cost; the async GCRA / fixed-window rows are the fair head-to-head. Date.now() is cheaper on Linux than Windows, so in-process numbers may be a few percent faster on a Linux runner. Single hot key means multi-key effects are deliberately excluded to isolate algorithm cost.

# In-process micro-benchmarks (+ allocations, + Redis):
npm run bench
node --expose-gc --import tsx bench/run.ts --redis

# Head-to-head vs rate-limiter-flexible and express-rate-limit:
npm run bench:compare

The CI bench-regression gate (npm run bench:gate) guards these numbers on every push using a machine-independent relative metric, so a hot-path regression fails the build rather than silently shipping.

Numbers you can re-run. Bounds you can re-check.

BENCH.md ↗ Capability comparison GALE TALE