Rate Limiter¶
frame/ratelimiter now provides four different admission-control primitives.
They are intentionally different because production overload does not show up in
just one way.
If you only remember one rule, remember this one:
Pick the limiter that matches the resource that fails first.
If you pick the wrong limiter, the code may still look correct while the system fails under load.
Quick Choice Guide¶
| If the real problem is... | Use | Why |
|---|---|---|
| One tenant, IP, or API key can send too many requests in a time window | WindowLimiter |
Fixed-window quota per shared key |
| Same as above, but the cache increment itself becomes a bottleneck at high volume | LeasedWindowLimiter |
Same quota semantics, fewer remote cache operations |
| Too many expensive operations run at the same time in one process | ConcurrencyLimiter |
Caps local in-flight work |
| Backlog in a queue, outbox, or worker pipeline is already too high | QueueDepthLimiter |
Stops admitting more work until backlog is healthy |
Do Not Confuse These Limiters¶
WindowLimiter¶
Use WindowLimiter when you want a straightforward, shared, distributed budget
per key and the cache backend can comfortably absorb one atomic increment per
request.
Correct use:
- per-tenant API request budgets
- per-user or per-IP HTTP protection
- moderate-volume internal service quotas
Wrong use:
- protecting CPU-heavy work that fails because too many tasks run at once
- protecting queue backlog growth
- very high-volume hot keys where cache traffic becomes expensive
Example:
raw, err := svc.GetRawCache("redis")
if err != nil {
return err
}
limiter, err := ratelimiter.NewWindowLimiter(raw, &ratelimiter.WindowConfig{
WindowDuration: time.Minute,
MaxPerWindow: 500,
KeyPrefix: "tenant:api",
FailOpen: false,
})
if err != nil {
return err
}
if !limiter.Allow(ctx, tenantID) {
return connect.NewError(connect.CodeResourceExhausted, errors.New("tenant request budget exceeded"))
}
LeasedWindowLimiter¶
Use LeasedWindowLimiter when the semantics of a fixed-window quota are still
correct, but a remote increment on every request would create a hot-key or
cache-throughput problem.
The limiter works by reserving quota from the shared cache in chunks, then
serving several local Allow calls from that reservation before touching the
cache again.
Correct use:
- very hot per-tenant ingest limits
- webhook ingress or event ingest with sustained high volume
- cases where you still want a distributed fixed-window budget
Wrong use:
- exact per-request observability of every increment at the cache layer
- low-volume endpoints where plain
WindowLimiteris simpler - concurrency protection for expensive local work
Guidance:
- Use the default reservation size unless profiling shows it is wrong.
- Larger reservations reduce cache traffic but make each process hold more local quota at once.
- Smaller reservations increase cache traffic but reduce local over-reservation.
Example:
raw, err := svc.GetRawCache("redis")
if err != nil {
return err
}
limiter, err := ratelimiter.NewLeasedWindowLimiter(raw, &ratelimiter.WindowConfig{
WindowDuration: time.Minute,
MaxPerWindow: 1_000_000,
KeyPrefix: "tenant:event-ingest",
FailOpen: false,
ReservationSize: 1024,
})
if err != nil {
return err
}
if !limiter.Allow(ctx, tenantID) {
return connect.NewError(connect.CodeResourceExhausted, errors.New("tenant ingest budget exceeded"))
}
ConcurrencyLimiter¶
Use ConcurrencyLimiter when the resource you are protecting is local and
finite. This is about simultaneous work, not requests per minute.
Correct use:
- capping in-flight connector calls in a worker
- bounding CPU-heavy transforms
- limiting database-heavy handlers in one process
- preventing one process from fan-out exploding into thousands of active goroutines
Wrong use:
- global tenant fairness across many replicas
- distributed rate limiting across a fleet
- backlog admission at producer boundaries
Important behavior:
- The limit is process-local.
- Ten pods with limit
100each can run about1000concurrent operations. TryAcquireis for fail-fast behavior.Acquireis for bounded waiting and should usually be paired with a context deadline.
Example:
connectorLimiter, err := ratelimiter.NewConcurrencyLimiter(128)
if err != nil {
return err
}
permit, ok := connectorLimiter.TryAcquire()
if !ok {
return ratelimiter.ErrConcurrencyLimitReached
}
defer permit.Release()
return connector.Execute(ctx, req)
If the caller should wait briefly for capacity:
connectorLimiter, err := ratelimiter.NewConcurrencyLimiter(128)
if err != nil {
return err
}
ctx, cancel := context.WithTimeout(ctx, 250*time.Millisecond)
defer cancel()
permit, err := connectorLimiter.Acquire(ctx)
if err != nil {
return err
}
defer permit.Release()
return connector.Execute(ctx, req)
QueueDepthLimiter¶
Use QueueDepthLimiter when the best overload signal is not raw request rate,
but backlog in a downstream work queue, outbox, retry queue, or scheduler
pipeline.
This limiter is an admission controller. It does not smooth traffic. It simply decides whether more work should be accepted right now.
Correct use:
- stopping event ingest when outbox backlog is unsafe
- pausing enqueue when worker queue depth is already too high
- preventing retry storms from making backlog worse
Wrong use:
- tenant fairness
- abuse protection
- shaping requests into a stable average rate
Important behavior:
RejectAtDepthcloses admission.ResumeAtDepthreopens admission.ResumeAtDepthmust be lower thanRejectAtDepth.- That gap is deliberate hysteresis so the system does not flap between open and closed every few milliseconds.
RefreshIntervalexists because depth lookups are often remote calls and should not happen on every request.
Example:
depthLimiter, err := ratelimiter.NewQueueDepthLimiter(
func(ctx context.Context) (int64, error) {
return queue.Pending(ctx, "workflow-events")
},
ratelimiter.QueueDepthConfig{
RejectAtDepth: 250_000,
ResumeAtDepth: 150_000,
RefreshInterval: 500 * time.Millisecond,
FailOpen: false,
},
)
if err != nil {
return err
}
if !depthLimiter.Allow(ctx) {
return connect.NewError(connect.CodeResourceExhausted, errors.New("event backlog too high"))
}
Recommended Layering¶
Many real systems need more than one limiter.
For example, a high-throughput event ingest service commonly needs:
LeasedWindowLimiterat ingress for tenant fairness.QueueDepthLimiterbefore enqueue so backlog cannot grow without bound.ConcurrencyLimiterin workers or connector executors so expensive local work does not saturate the process.
These are complementary, not redundant.
Failure-Mode Guidance¶
Choose FailOpen vs FailOpen=false deliberately:
- Fail open when temporary inability to measure should not stop user traffic.
- Fail closed when the protected downstream system is sensitive enough that admitting work without measurement is too risky.
Examples:
- Public webhook ingest commonly fails open for request-rate checks but may fail closed for severe backlog admission.
- Financially sensitive connector execution often fails closed on local concurrency or dependency health checks.
What To Avoid¶
- Do not use only a request-rate limiter when the real failure mode is queue buildup.
- Do not use only a queue-depth limiter when one tenant can starve everyone else.
- Do not use only a concurrency limiter when many replicas together can still overload a shared downstream dependency.
Production systems usually need a combination, not a single universal limiter.