Prove what your fleet admits.
Lease a batch of credits, serve them locally at in-process speed — and a one-line refinement called window-coupling collapses worst-case global admissions to exactly the limit, for any number of nodes. Not a tuned heuristic — a bound, machine-checked in TLA⁺.
Put the counter in Redis and every node consults it once per request — one network hop in the hot path, paid exactly when traffic (and attacks) spike. Cache locally to avoid that and you lose the global guarantee: N nodes each admit their own share and the real total drifts above the limit.
GALE's two-tier engine fronts the shared tier (L2) with a local one (L1). Most requests are decided locally with zero network; a node reaches for authority only when its local credits run out — and even then, what it can over-admit is held to a hard, provable bound.
credits ≥ cost ? → serve locally, 0 network
else → lease a batch from L2 (atomic Lua) → refill L1
Leasing's only source of over-admission is carryover: L2 resets to the limit each window, but a node can still hold up to Batch − 1 unconsumed credits from the previous window's last lease. Across N nodes that's up to N·(Batch − 1) extra admissions riding on a fresh full limit.
Bounded — but it scales with the fleet.
Tight, and machine-checked. But the overshoot term grows with the number of nodes N.
Exactly the limit — at any N.
Expire local credits when the window that granted them rolls. The only overshoot term vanishes — the bound stops depending on N.
Proven in spec/GaleWindowCoupledLeasing.tla (MaxAdmitted == Limit) and reproduced by an exhaustive BFS twin that re-checks the same reachable-state invariant in CI on every push. The cost is one re-lease per busy node just after each boundary — a bounded near-boundary utilization dip (liveness, not safety), which is exactly what adaptive lease sizing minimizes.
Exact, every time.
Every check consults L2 — one round trip per request, no local state. The safe default for hard quotas and billing.
Protect the protector.
Allows stay globally exact; denials are cached locally for their retry window. A client flooding a blocked key can't turn that flood into L2 load — the natural default for public endpoints.
Amortized to near-zero.
Lease a batch, serve it locally. ~1 round trip per batch requests — the core of GALE, with the window-coupled bound intact.
leased mode over Redis at batch 100 — 99% of requests never leave the process.EVALSHA per request. The same proven transform, every hop.Absolute latency is the local Docker-on-Windows loopback, not Redis — the relative shape transfers, the absolute p50 does not. See the full methodology →
With window-coupling making safety independent of batch size, the batch trades just two costs: coordination (L2 round trips) against stranding (leased-but-unused credits that expire at the boundary). That's the classic Economic Order Quantity trade — minimized at b* = √(2·orderCost·D / strandPenalty).
Demand D is unknown, skewed across nodes, and drifting, so no fixed batch is right. leaseSizer learns it online with AdaGrad in log-space — O(√T) regret against the best fixed batch in hindsight — and predictiveLeaseSizer adds a Hedge-arbitrated predictor on top. Safety is structurally decoupled: window-coupling caps admissions at the limit for any batch sequence the learner emits, so sizing can never loosen the bound.
Any zero-coordination protocol — each node pre-authorizes a local budget and admits against it — obeys a lower bound on worst-case overshoot Δ and under-utilization U:
Both corners are ruinous. Exact (Δ = 0) starves a hot node to 1/N of the budget; work-conserving (U = 0) forces every node to hold the full L, so Δ ≥ (N−1)L. You cannot have both for free. Window-coupled leasing reaches the Δ = 0, U ≈ 0 corner the trilemma proves is unreachable at zero coordination — paying only the coordination the bound says is unavoidable.
weightedFairEscrow splits one budget across tenants in weighted-max-min-fair proportion, work-conserving: an idle tenant's surplus is reclaimed by backlogged ones, but a hard global cap is enforced first so fairness never over-admits. Borrowing is capped at the call's own cost (Deficit Round Robin quantum), so one call can't grab all the slack.
Honesty note: WFE is not strategy-proof — by FairRide (Pu et al., NSDI'16), no shared primitive can be sharing-incentive, work-conserving, and strategy-proof at once. WFE takes the first two; window-coupling bounds a misreporter's gain to a single window.
- admitted ≤ Limit, independent of N — the one-line window-coupling refinement, TLC-checked. spec/GaleWindowCoupledLeasing.tla
- The baseline tight bound —
Limit + N·(Batch−1), with tightness shown via an intentionally-false invariant. spec/DistributedLeasing.tla - Re-checked in CI, no Java — a BFS twin reproduces the spec's reachable-state counts, then proves
maxAdmitted == Lfor every N ∈ {1,2,4,8}. test/gale/leasing-variants.test.ts - The trilemma + tightness —
Δ + N·U ≥ (N−1)Lverified for N ∈ {2,3,4}. test/gale/trilemma.test.ts - Fairness theorems — water-filling correctness + four fairness theorems machine-checked on 20,000 random instances. test/gale/fair-escrow.test.ts
The edges: leased mode admits a bounded overshoot, not zero — it's exactly the limit only when windowCoupled, and even then a small near-boundary utilization dip remains. lowWater > 0 trades a looser bound for hidden lease latency. On an L2 error the path fails closed (a denied lease surfaces L2's denial). Only strategies with a discrete window boundary participate in window-coupling; pure-rate GCRA/token-bucket lease without it.