About

Held to a proof, not a hope.

Most rate limiters do the easy 10% (count requests on one box) and leave the hard parts to luck: what your fleet actually admits, and what your LLM actually spends. ThrottleKit treats those as things to bound and verify, then ships the bounds as ordinary features.

GitHub ↗ Design docs

Why it exists

The gap nobody fills.

A single-process counter is a solved problem. A dozen good libraries do it. But the moment you have more than one box, "the limit" becomes approximate: every node admits its own share and the real total drifts above what you set. And the moment your cost is an LLM completion, request-counting misses the point entirely. The thing you actually pay for is tokens, known only after the stream ends.

Those are the two places ThrottleKit is built for. Not a faster counter: a limiter that can tell you, with a machine-checked bound, that admissions never exceed the limit no matter how many nodes you run; and one that can govern token spend with overshoot independent of the cap. The rest of a complete limiter is here too, but those two are the reason it exists.

How it's built

Four commitments.

Machine-checked is a verification technique, not a formality. The load-bearing bounds live in TLA⁺ specs run through TLC, and a dependency-free twin that re-derives the same invariants in CI on every push, treated like exhaustive testing because that's what it is.
One verified core, everywhere. Strategies are pure functions of time, compiled to an atomic Lua form proven bit-identical across six stores. Add a backend by implementing one primitive; adding an algorithm never touches a store. The Python client computes no math: it reaches the same oracle.
Safety is decoupled from cleverness. The online learners (lease sizing, token reservation) only trade efficiency; the hard cap is held structurally, so no predictor, however adversarial, can breach it.
Honest edges, in writing. Every component doc ends with its caveats and failure behavior. The benchmarks lead with methodology and say plainly where an incumbent wins. Numbers trace to a harness you can re-run.

The two engines

Research that ships as features.

GALE · the placement axis

Provable distributed leasing.

Lease credits, serve locally, and window-coupling collapses worst-case global admissions to exactly the limit, independent of fleet size. Machine-checked in TLA⁺.

How GALE works →

TALE · the cost axis

Token-budget escrow for LLMs.

Meter output tokens as they stream and stop at the boundary, with overshoot independent of max_tokens and concurrency. Δ = 0 at per-token granularity.

How TALE works →

Status

A stable 1.x core, an open frontier.

The core (strategies, stores, adapters, the two-tier engine) is shipped and stable under SemVer. Pieces on the evolving frontier are marked experimental and excluded from the surface guarantee, so you always know what you're depending on. ThrottleKit is polyglot from one verified core: Node today, Python today via a thin client that reaches the same oracle, more to follow.

Stability policy ↗ Architecture Benchmarks

Who & license

MIT. Developed in the open.

ThrottleKit is built by Ameya Borkar and released under the MIT license, free to use, inspect, and build on. The design is documented component by component, the proofs are in the repo, and the benchmarks run on your hardware. If you find an edge the docs don't cover, the issues are open.

GitHub ↗ npm ↗ PyPI ↗ Docs ↗