04 · Two-tier engine & leasing (GALE)

The distributed engine: how a fleet enforces one global limit without paying a network round trip per request, and how the over-admission is held to a hard, fleet-size-independent bound. Internally this is GALE (provable distributed leasing). Source: src/twotier/, spec/, docs/FORMAL-MODEL.md.

Purpose

A purely distributed limiter pays one round trip to the shared store on every request: the wrong cost exactly when traffic (and attacks) spike. The two-tier engine fronts the distributed tier (L2) with a local tier (L1) and lets you choose the consistency/throughput trade per limiter. Its headline mode, leased, collapses steady-state network cost to roughly one round trip per batch requests while keeping a provable bound on how far global admissions can exceed the limit. With window-coupling, that bound becomes exactly the limit, independent of the number of nodes.

request ─► L1 (in-process, sync)
              │  decisive locally?  ──► return (0 network)
              │  needs authority?   ──► L2 (Redis, atomic Lua) ──► update L1
              ▼
           Decision

The three modes

twoTier({ strategy, l2, mode, lease? }) returns a Limiter (src/twotier/index.ts:163):

strict: every check consults L2 directly. Exact, 1 RTT/request, no local state. The safe default for hard quotas and billing.
cached-deny: allowed traffic still hits L2 (so allows stay globally exact), but denials are cached locally for their retryAfterMs. Once a key is over the limit, further requests are rejected from L1 with no round trip, so an abusive client flooding a blocked key can’t translate that flood into L2 load. This is the “protect the protector” mode, a natural default for public endpoints.
leased: each node atomically leases a batch of credits from L2 and serves them from a local counter. The core of GALE.

The lease lifecycle (leased mode)

All per-key L1 state is one LeaseEntry record (src/twotier/index.ts:133). The check loop (src/twotier/index.ts:343):

Serve locally: if (credits ≥ cost) { credits -= cost; return allow }. A local hit, no network.
Lease on shortage: when credits < cost, atomically lease max(leaseSize, cost) from L2; on success, add the grant to local credits and retry.
Coalesce concurrent misses: the in-flight lease promise is stored on the entry; concurrent misses on the same key await it instead of issuing their own lease. This guarantees a node never holds more than batch outstanding (the assumption the overshoot bound rests on) and stops a cold hot-key from stampeding L2.
Surface global exhaustion: when L2 is globally out of budget and nothing remains locally, the node returns L2’s denial.

lowWater > 0 (opt-in) fires a fire-and-forget refill once local credits drop to the threshold, so requests never block on the network, at the cost of a looser bound. returnIdleAfterMs drops keys whose local credits have sat idle, returning that capacity to L2 for other nodes; its timer is owned by the limiter and released on close().

The overshoot bound, and why window-coupling makes it fleet-size-independent

The baseline leased mode (credits carry across window boundaries) has a tight bound, machine-checked in spec/DistributedLeasing.tla and docs/FORMAL-MODEL.md:

admitted_per_window ≤ Limit + N · (Batch − 1)        // N = number of nodes

The carryover is the sole source of overshoot: L2 resets to Limit each window, but each node may still hold up to Batch − 1 unconsumed local credits from the prior window’s last lease. Across N nodes that is up to N·(Batch − 1) extra admissions riding on a fresh full Limit. (The library’s documented ≤ L × batch is the safe headline; the proof sharpens it by N.)

Window-coupling is the one-line refinement that removes it. When the L2 window that granted a key’s credits rolls (now ≥ lastDecision.resetAt), the local credits expire instead of carrying over (src/twotier/index.ts:358). Because cross-window carryover was the only overshoot source, the per- window bound collapses to:

admitted_per_window ≤ Limit          for ANY number of nodes N

This is the load-bearing result: the bound stops depending on N because there is no longer any per-node leftover to sum across nodes. It is proven in spec/GaleWindowCoupledLeasing.tla (MaxAdmitted == Limit) and reproduced by an exhaustive BFS twin that runs in CI. The cost is one re-lease per busy node just after each boundary: a bounded near-boundary utilization dip (a liveness, not a safety, concern), which is exactly what adaptive lease sizing minimizes.

EOQ lease sizing: trading round trips against stranded budget

Once window-coupling makes safety independent of the batch size, the batch governs only efficiency: coordination cost (L2 round trips) versus stranding (leased-but-unused credits that expire at the boundary). For a node serving demand D per window, that is the classic Economic Order Quantity trade (src/twotier/sizing.ts:11):

f_D(b) = orderCost · D / b   +   strandPenalty · b / 2
       =  (coordination)      +   (stranding)

minimized at b* = √(2 · orderCost · D / strandPenalty). Because D is unknown, skewed across nodes, and non-stationary, no fixed b is right, so leaseSizer (src/twotier/sizing.ts:98) learns it online with AdaGrad in log-space (it optimizes x = ln b, since the optimum spans orders of magnitude). It attains O(√T) regret against the best fixed batch in hindsight and tracks a drifting optimum; the engine runs one learner per key, feeding it each window’s served demand and reading back the next batch size. predictiveLeaseSizer adds a prediction-augmented sibling: two experts (“follow the predicted demand” and the robust leaseSizer) arbitrated by a Hedge meta-learner, so accurate predictions approach the clairvoyant optimum while bad ones fall back to the no-regret bound.

Why AdaGrad here (vs OGD on the cost axis, 05): the EOQ gradient is unbounded and smooth, which is exactly where AdaGrad’s per-coordinate adaptivity earns its keep; the cost axis’s pinball gradient is bounded and constant-magnitude, where vanilla OGD is regret-optimal. Same family, the tool calibrated to the gradient’s shape.

Safety is decoupled from the learner. Window-coupling caps per-window admissions at Limit for any batch sequence the learner emits, so adaptive sizing can only trade coordination against stranding. It can never loosen the cap (src/twotier/index.ts:84).

The trilemma: why coordination is unavoidable

GALE is framed by a one-line lower bound (test/gale/trilemma.test.ts). Any zero-coordination protocol (each node pre-authorizes a local budget and admits against it) has worst-case overshoot Δ and under-utilization U satisfying:

Δ + N · U ≥ (N − 1) · L          (tight at the uniform allocation)

Both corners are ruinous: an exact protocol (Δ = 0) starves a hot node to 1/N of the budget (U ≥ L(N−1)/N); a work-conserving one (U = 0) forces every node to hold the full L, so Δ ≥ (N−1)L. You cannot have both at zero coordination. This is why leasing spends a little coordination: window-coupled leasing reaches the Δ = 0, U ≈ 0 corner the trilemma proves is unreachable for free, paying only the coordination the bound says is unavoidable. Two interpolations are also proven and machine-checked (static-partition and dynamic-leasing variants).

Weighted-fair escrow: who gets the next credit

When the budget is contended, weightedFairEscrow (src/twotier/weighted-fair-escrow.ts) splits one budget L across tenants in weighted-max-min-fair proportion, work-conserving (an idle tenant’s surplus is reclaimed by backlogged ones). The decide logic is hierarchical: a hard global cap first (never over-admit regardless of fairness), then “within your guaranteed share,” then a borrow phase where a tenant over its guarantee can take surplus other tenants haven’t claimed, capped at the call’s own cost (Deficit Round Robin quantum semantics), which bounds the unfairness so one call can’t grab all the slack. In multi-process mode the effective budget is grown lazily by leasing quanta from a shared fixedWindow counter, whose atomicity bounds the global total at L.

federatedWeightedFairEscrow (covered in 08) composes two levels of this to make each tenant’s global total match the flat global weighted-max-min ideal across regions.

Honesty note: WFE is not strategy-proof: per FairRide (Pu et al., NSDI’16), no shared primitive can be sharing-incentive, work-conserving, and strategy-proof at once. WFE takes the first two; window- coupling bounds a misreporter’s gain to a single window.

Tier-2: the client-held lease (service-door scale ceiling)

Everything above is an in-process L1 fronting an in-process-reachable L2 (a Redis the node can EVAL against). When the distributed engine lives behind the gRPC service instead, a high-throughput client pays one Check/Debit round trip per request, re-introducing exactly the cost the leased mode exists to amortize, now across a network the client can’t EVAL over. Tier-2 lifts the leased-mode amortization across that service boundary: the client leases a chunk of a policy’s global per-window budget over one RPC and serves requests locally, round-tripping only to refresh.

One oracle, two roles. The split is the load-bearing contract:

The server sizes the grant. A client calls the gRPC Fleet.Reserve door (a federated: policy; 13, 14), and the server (the one oracle) computes the grant size through that policy’s federation coordinator (coordinator.lease / leaseWindowed, 08). It returns a Lease{capacity, expiry_ms, refresh_interval_ms, safe_capacity, retry_after_ms, limit}.
The client only spends it. The client runs the core LeaseSpender (throttlekit/twotier, src/twotier/lease-spender.ts), a verbatim port of the twoTier(mode:"leased", lease:{windowCoupled:true}) L1 spend: applyLease (add capacity to local credits, couple them to expiry_ms) → spend (subtract and synthesize an allow byte-identical to the core L1 synthAllow) → discard the remainder when the window rolls. It never invents a denial: on a local shortfall it signals needsRefresh, and the server’s authoritative Decision is surfaced verbatim when a refresh can’t be granted.

One oracle therefore holds iff the client’s local spend is byte-identical to the core’s L1 leased spend. That equality is not assumed: it is pinned by a new lease golden-vector suite (wire/vectors, 13) and proven against the shipped twoTier leased path by the lease-spender conformance test. LeaseSpender.spend is pure, synchronous (now injected per call, like every core algorithm), and portable to any polyglot client, and runs at ≈ 10 ns/op, so the local path is effectively free against the RPC it replaces.

The four hazards, and where each is closed:

Clock skew. expiry_ms is the server/store window boundary (the useServerTime window the coordinator drained the grant against), not the client’s clock. The client treats it as authoritative and never extends it, so a client clock running fast or slow can’t carry a credit past the instant the global budget actually reset. (leaseWindowed returns this boundary atomically with the grant, closing the node↔store skew the old lease()’s ignored expiresAt left open — 08.)
Window alignment. When now >= expiry_ms, LeaseSpender discards the leftover credits rather than carrying them across the boundary (expireIfRolled, folded into every spend). Cross-window carryover is the sole source of leased overshoot; discarding it holds the per-window global total to exactly the limit, independent of how many clients lease concurrently: the Tier-2 lift of the window-coupling result above.
Refresh race. Concurrent local misses on one key must not each fire their own Reserve; the client coalesces them onto one in-flight refresh (the same “at most one outstanding lease” coalescing the in-process leased path uses to bound a node to batch outstanding).
Partial grant. capacity may be < wants (other clients raced the budget down), and 0 is a refusal (the global budget is spent). spendOrRefresh treats a zero-capacity grant without a denial as a contract violation by reserve() and fails loudly rather than spinning; a real refusal surfaces the server’s Decision with its retry_after_ms.

The lease data lives only on the Fleet service: the Decision message is never extended with lease fields (reserved 6..15; 13). Tier-2 is the service-door analog of leased mode, not a new algorithm: the math is the in-process leased path, relocated across the wire with the oracle held on the server.

Design decisions & rationale

leased is opt-in, not the default. It trades exactness for network amortization; strict (exact) is the safe default and leasing is the tunable lever for hot keys where RTT-per-request dominates.
Window-coupling is the keystone. Expiring credits at the boundary removes the only overshoot term that summed across nodes, converting a Limit + N·(Batch−1) bound into exactly Limit.
EOQ + an online learner, because the coordination-vs-stranding trade is a textbook order-cost-vs- holding-cost problem and the demand that parameterizes it is unknown, skewed, and drifting.
Safety structurally decoupled from learning: the escrow store holds the hard per-window cap, so no learner or predictor can breach it; the regret results govern only efficiency.

Caveats

Leased mode admits a bounded overshoot, not zero, unless windowCoupled, where it is exactly Limit (and even then a bounded near-boundary utilization dip remains). lowWater > 0 trades a looser bound for hidden lease latency.
cached-deny consistency and single-process WFE fairness are per-process; set maxKeys on public surfaces so the L1 maps can’t grow unbounded.
On an L2 error the lease path leaves credits as-is and the next check leases synchronously; a denied lease surfaces L2’s denial (fail-closed).
Only strategies with a discrete window boundary participate in window-coupling; pure-rate GCRA/token- bucket lease without it.

What proves it

spec/DistributedLeasing.tla: the baseline tight bound Limit + N·(Batch−1) (TLC-checked; tightness via an intentionally-false invariant).
spec/GaleWindowCoupledLeasing.tla: the one-line refinement proving admitted ≤ Limit independent of N.
test/gale/leasing-variants.test.ts: the Java-free BFS twin: reproduces the spec’s reachable-state counts (harness validation), then proves windowCoupled maxAdmitted == L for every N ∈ {1,2,4,8}.
test/gale/trilemma.test.ts: exhaustively verifies Δ + N·U ≥ (N−1)L and tightness for N ∈ {2,3,4}.
test/gale/lease-sizer.test.ts: eoqOptimum, sublinear regret, beats best-fixed under drift, and Pillar 1 ⊕ 2 safety (admitted ≤ limit for any learned sizes).
test/gale/predictive-sizer.test.ts: Hedge consistency/robustness/unconditional safety.
test/gale/fair-escrow.test.ts: water-filling correctness + four fairness theorems machine-checked on 20,000 random instances.
lease-spender conformance + the lease golden-vector suite (wire/vectors): pin LeaseSpender.spend byte-identical to the shipped twoTier leased L1 path, so the Tier-2 client spend can’t diverge from the one oracle.

Source map

src/twotier/index.ts (the engine + three modes) · sizing.ts (leaseSizer, predictiveLeaseSizer) · weighted-fair-escrow.ts · federated-weighted-fair-escrow.ts (08) · lease-spender.ts (the Tier-2 client-held lease spend; door + shape in 13 / 14) · spec/DistributedLeasing.tla, spec/GaleWindowCoupledLeasing.tla · docs/FORMAL-MODEL.md.