GALE · Globally-Accounted Learned Escrow

Prove what your fleet admits.

Lease a batch of credits, serve them locally at in-process speed, and a one-line refinement called window-coupling collapses worst-case global admissions to exactly the limit, for any number of nodes. Not a tuned heuristic: a bound, machine-checked in TLA⁺.

Read the GALE design → Benchmarks TALE — the cost axis

Why distributed limiting is hard

A shared limiter pays a round trip on every request.

Put the counter in Redis and every node consults it once per request. That's one network hop in the hot path, paid exactly when traffic (and attacks) spike. Cache locally to avoid that and you lose the global guarantee: N nodes each admit their own share and the real total drifts above the limit.

GALE's two-tier engine fronts the shared tier (L2) with a local one (L1). Most requests are decided locally with zero network; a node reaches for authority only when its local credits run out, and even then, what it can over-admit is held to a hard, provable bound.

the hot path

request → L1 (in-process, sync)
credits ≥ cost ? → serve locally, 0 network
else → lease a batch from L2 (atomic Lua) → refill L1

The keystone: window-coupling

From "bounded by fleet size" to "exactly the limit."

Leasing's only source of over-admission is carryover: L2 resets to the limit each window, but a node can still hold up to Batch − 1 unconsumed credits from the previous window's last lease. Across N nodes that's up to N·(Batch − 1) extra admissions riding on a fresh full limit.

Baseline leasing

Bounded, but it scales with the fleet.

admitted ≤ Limit + N·(Batch − 1)

Tight, and machine-checked. But the overshoot term grows with the number of nodes N.

Window-coupled

Exactly the limit, at any N.

admitted ≤ Limit ∀ N

Expire local credits when the window that granted them rolls. The only overshoot term vanishes, so the bound stops depending on N.

Proven in spec/GaleWindowCoupledLeasing.tla (MaxAdmitted == Limit) and reproduced by an exhaustive BFS twin that re-checks the same reachable-state invariant in CI on every push. The cost is one re-lease per busy node just after each boundary: a bounded near-boundary utilization dip (liveness, not safety), which is exactly what adaptive lease sizing minimizes.

Three modes, pick your trade per limiter

Exactness, attack-resistance, or throughput.

strict

Exact, every time.

Every check consults L2: one round trip per request, no local state. The safe default for hard quotas and billing.

cached-deny

Protect the protector.

Allows stay globally exact; denials are cached locally for their retry window. A client flooding a blocked key can't turn that flood into L2 load, which makes it the natural default for public endpoints.

leased

Amortized to near-zero.

Lease a batch, serve it locally: ~1 round trip per batch requests, the core of GALE, with the window-coupled bound intact.

The two-tier lever, measured

~97× the throughput, same correctness envelope.

71.2k

ops/sec, leased mode over Redis at batch 100. 99% of requests never leave the process.

bench/run.ts --redis

735

ops/sec, strict GCRA: one EVALSHA per request. The same proven transform, every hop.

bench/run.ts --redis

1/100

L2 round trips per request at batch 100. The lease math is machine-checked; throughput, not safety, is what changed.

spec/ · CI

Absolute latency is the local Docker-on-Windows loopback, not Redis. The relative shape transfers; the absolute p50 does not. See the full methodology →

Lease sizing, once safety is free

The batch governs only efficiency now.

With window-coupling making safety independent of batch size, the batch trades just two costs: coordination (L2 round trips) against stranding (leased-but-unused credits that expire at the boundary). That's the classic Economic Order Quantity trade, minimized at b* = √(2·orderCost·D / strandPenalty).

Demand D is unknown, skewed across nodes, and drifting, so no fixed batch is right. leaseSizer learns it online with AdaGrad in log-space (O(√T) regret against the best fixed batch in hindsight), and predictiveLeaseSizer adds a Hedge-arbitrated predictor on top. Safety is structurally decoupled: window-coupling caps admissions at the limit for any batch sequence the learner emits, so sizing can never loosen the bound.

Why a little coordination is unavoidable

The trilemma GALE is built to beat.

Any zero-coordination protocol (each node pre-authorizes a local budget and admits against it) obeys a lower bound on worst-case overshoot Δ and under-utilization U:

the lower bound: test/gale/trilemma.test.ts

Δ + N·U ≥ (N − 1)·L

Both corners are ruinous. Exact (Δ = 0) starves a hot node to 1/N of the budget; work-conserving (U = 0) forces every node to hold the full L, so Δ ≥ (N−1)L. You cannot have both for free. Window-coupled leasing reaches the Δ = 0, U ≈ 0 corner the trilemma proves is unreachable at zero coordination, paying only the coordination the bound says is unavoidable.

When the budget is contended

Weighted-fair escrow: who gets the next credit.

weightedFairEscrow splits one budget across tenants in weighted-max-min-fair proportion, work-conserving: an idle tenant's surplus is reclaimed by backlogged ones, but a hard global cap is enforced first so fairness never over-admits. Borrowing is capped at the call's own cost (Deficit Round Robin quantum), so one call can't grab all the slack.

Honesty note: WFE is not strategy-proof. By FairRide (Pu et al., NSDI'16), no shared primitive can be sharing-incentive, work-conserving, and strategy-proof at once. WFE takes the first two; window-coupling bounds a misreporter's gain to a single window.

What proves it · the honest edges

A bound is only as good as its proof.

admitted ≤ Limit, independent of N: the one-line window-coupling refinement, TLC-checked. spec/GaleWindowCoupledLeasing.tla
The baseline tight bound is Limit + N·(Batch−1), with tightness shown via an intentionally-false invariant. spec/DistributedLeasing.tla
Re-checked in CI, no Java: a BFS twin reproduces the spec's reachable-state counts, then proves maxAdmitted == L for every N ∈ {1,2,4,8}. test/gale/leasing-variants.test.ts
The trilemma + tightness (Δ + N·U ≥ (N−1)L) verified for N ∈ {2,3,4}. test/gale/trilemma.test.ts
Fairness theorems. Water-filling correctness + four fairness theorems, machine-checked on 20,000 random instances. test/gale/fair-escrow.test.ts

The edges: leased mode admits a bounded overshoot, not zero. It's exactly the limit only when windowCoupled, and even then a small near-boundary utilization dip remains. lowWater > 0 trades a looser bound for hidden lease latency. On an L2 error the path fails closed (a denied lease surfaces L2's denial). Only strategies with a discrete window boundary participate in window-coupling; pure-rate GCRA/token-bucket lease without it.

Admitted ≤ Limit. Proven, at any fleet size.

Full GALE design → TALE — govern cost Benchmarks GitHub ↗