throttlekit
GALE · the distributed bound

Prove what your fleet admits.

Lease a batch of credits, serve them locally at in-process speed — and a one-line refinement called window-coupling collapses worst-case global admissions to exactly the limit, for any number of nodes. Not a tuned heuristic — a bound, machine-checked in TLA⁺.

Why distributed limiting is hard
A shared limiter pays a round trip on every request.

Put the counter in Redis and every node consults it once per request — one network hop in the hot path, paid exactly when traffic (and attacks) spike. Cache locally to avoid that and you lose the global guarantee: N nodes each admit their own share and the real total drifts above the limit.

GALE's two-tier engine fronts the shared tier (L2) with a local one (L1). Most requests are decided locally with zero network; a node reaches for authority only when its local credits run out — and even then, what it can over-admit is held to a hard, provable bound.

the hot path
request → L1 (in-process, sync)
  credits ≥ cost ?  → serve locally, 0 network
  else  → lease a batch from L2 (atomic Lua) → refill L1
The keystone — window-coupling
From "bounded by fleet size" to "exactly the limit."

Leasing's only source of over-admission is carryover: L2 resets to the limit each window, but a node can still hold up to Batch − 1 unconsumed credits from the previous window's last lease. Across N nodes that's up to N·(Batch − 1) extra admissions riding on a fresh full limit.

Baseline leasing

Bounded — but it scales with the fleet.

admitted ≤ Limit + N·(Batch − 1)

Tight, and machine-checked. But the overshoot term grows with the number of nodes N.

Window-coupled

Exactly the limit — at any N.

admitted ≤ Limit  ∀ N

Expire local credits when the window that granted them rolls. The only overshoot term vanishes — the bound stops depending on N.

Proven in spec/GaleWindowCoupledLeasing.tla (MaxAdmitted == Limit) and reproduced by an exhaustive BFS twin that re-checks the same reachable-state invariant in CI on every push. The cost is one re-lease per busy node just after each boundary — a bounded near-boundary utilization dip (liveness, not safety), which is exactly what adaptive lease sizing minimizes.

Three modes — pick your trade per limiter
Exactness, attack-resistance, or throughput.
strict

Exact, every time.

Every check consults L2 — one round trip per request, no local state. The safe default for hard quotas and billing.

cached-deny

Protect the protector.

Allows stay globally exact; denials are cached locally for their retry window. A client flooding a blocked key can't turn that flood into L2 load — the natural default for public endpoints.

leased

Amortized to near-zero.

Lease a batch, serve it locally. ~1 round trip per batch requests — the core of GALE, with the window-coupled bound intact.

The two-tier lever — measured
~85× the throughput, same correctness envelope.
66.4k
ops/sec, leased mode over Redis at batch 100 — 99% of requests never leave the process.
bench/run.ts --redis
783
ops/sec, strict GCRA — one EVALSHA per request. The same proven transform, every hop.
bench/run.ts --redis
1/100
L2 round trips per request at batch 100. The lease math is machine-checked — throughput, not safety, is what changed.
spec/ · CI

Absolute latency is the local Docker-on-Windows loopback, not Redis — the relative shape transfers, the absolute p50 does not. See the full methodology →

Lease sizing — once safety is free
The batch governs only efficiency now.

With window-coupling making safety independent of batch size, the batch trades just two costs: coordination (L2 round trips) against stranding (leased-but-unused credits that expire at the boundary). That's the classic Economic Order Quantity trade — minimized at b* = √(2·orderCost·D / strandPenalty).

Demand D is unknown, skewed across nodes, and drifting, so no fixed batch is right. leaseSizer learns it online with AdaGrad in log-space — O(√T) regret against the best fixed batch in hindsight — and predictiveLeaseSizer adds a Hedge-arbitrated predictor on top. Safety is structurally decoupled: window-coupling caps admissions at the limit for any batch sequence the learner emits, so sizing can never loosen the bound.

Why a little coordination is unavoidable
The trilemma GALE is built to beat.

Any zero-coordination protocol — each node pre-authorizes a local budget and admits against it — obeys a lower bound on worst-case overshoot Δ and under-utilization U:

the lower bound — test/gale/trilemma.test.ts
Δ + N·U ≥ (N − 1)·L

Both corners are ruinous. Exact (Δ = 0) starves a hot node to 1/N of the budget; work-conserving (U = 0) forces every node to hold the full L, so Δ ≥ (N−1)L. You cannot have both for free. Window-coupled leasing reaches the Δ = 0, U ≈ 0 corner the trilemma proves is unreachable at zero coordination — paying only the coordination the bound says is unavoidable.

When the budget is contended
Weighted-fair escrow — who gets the next credit.

weightedFairEscrow splits one budget across tenants in weighted-max-min-fair proportion, work-conserving: an idle tenant's surplus is reclaimed by backlogged ones, but a hard global cap is enforced first so fairness never over-admits. Borrowing is capped at the call's own cost (Deficit Round Robin quantum), so one call can't grab all the slack.

Honesty note: WFE is not strategy-proof — by FairRide (Pu et al., NSDI'16), no shared primitive can be sharing-incentive, work-conserving, and strategy-proof at once. WFE takes the first two; window-coupling bounds a misreporter's gain to a single window.

What proves it · the honest edges
A bound is only as good as its proof.
  • admitted ≤ Limit, independent of N — the one-line window-coupling refinement, TLC-checked. spec/GaleWindowCoupledLeasing.tla
  • The baseline tight boundLimit + N·(Batch−1), with tightness shown via an intentionally-false invariant. spec/DistributedLeasing.tla
  • Re-checked in CI, no Java — a BFS twin reproduces the spec's reachable-state counts, then proves maxAdmitted == L for every N ∈ {1,2,4,8}. test/gale/leasing-variants.test.ts
  • The trilemma + tightnessΔ + N·U ≥ (N−1)L verified for N ∈ {2,3,4}. test/gale/trilemma.test.ts
  • Fairness theorems — water-filling correctness + four fairness theorems machine-checked on 20,000 random instances. test/gale/fair-escrow.test.ts

The edges: leased mode admits a bounded overshoot, not zero — it's exactly the limit only when windowCoupled, and even then a small near-boundary utilization dip remains. lowWater > 0 trades a looser bound for hidden lease latency. On an L2 error the path fails closed (a denied lease surfaces L2's denial). Only strategies with a discrete window boundary participate in window-coupling; pure-rate GCRA/token-bucket lease without it.

Admitted ≤ Limit. Proven — at any fleet size.