Rate Limiting Strategies to Protect APIs and Control Cloud Costs
Introduction
Rate limiting controls how many requests a client can make during a time window. It protects availability, ensures fair usage, and prevents surprise cloud bills caused by abusive or accidental traffic spikes.
Why it matters
- Stability: Prevents resource starvation and cascading failures.
- Fairness: Stops noisy neighbors from degrading service for others.
- Cost control: Caps bursty traffic that would otherwise inflate egress/compute costs.
- Abuse defense: Mitigates brute force, scraping, and DoS attempts.
Common techniques
- Fixed window: Allow N requests per minute/hour. Simple, but can allow bursts at window boundaries.
- Sliding window (log or counters): Counts requests in the last T seconds. Fairer than fixed window; higher state overhead if using logs.
- Token bucket: Tokens refill at rate r; requests consume 1 token. Supports controlled bursts up to bucket size b.
- Leaky bucket (queue): Processes at a constant rate; excess is queued or dropped. Smooths traffic aggressively.
Comparison table
| Strategy | Burst handling | Fairness | State/complexity | Good for |
|---|---|---|---|---|
| Fixed window | Allows boundary bursts | Lower | Low | Simple per-IP or per-key caps |
| Sliding window | Controls boundary bursts | High | Medium (logs) / Low (rolling counter) | Public APIs needing fairness |
| Token bucket | Supports limited bursts | High | Low | User-tier limits; sustained rate with bursts |
| Leaky bucket | Strict smoothing | High | Low–Medium | Backpressure on bursty writers |
Choosing keys and tiers
- Key by identity: API key, OAuth client, or user ID (not only IP).
- Segment by tier: Free vs. Pro vs. Enterprise limits.
- Path/operation classes: Stricter limits for expensive endpoints.
- Geo/region-aware: Apply limits close to where traffic enters.
Practical snippets
Nginx fixed window with burst
# /etc/nginx/conf.d/ratelimit.conf
limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;
server {
listen 443 ssl http2;
server_name api.example.com;
location /v1/ {
limit_req zone=perip burst=20 nodelay;
proxy_pass http://backend;
}
}
Express + Redis (token bucket-ish)
// pseudo-code: per-key tokens with Redis
const rate = 5; // tokens per second
const burst = 50; // bucket size
const ttl = 3600;
app.use(async (req, res, next) => {
const key = `bucket:${req.user.id}`;
const now = Date.now();
// LUA script ideal; simplified JS here:
let bucket = await redis.hgetall(key);
if (!bucket.last) bucket = { tokens: burst, last: now };
const elapsed = (now - bucket.last) / 1000;
const tokens = Math.min(burst, Number(bucket.tokens) + elapsed * rate);
if (tokens < 1) return res.status(429).json({ error: "Too Many Requests" });
await redis.hset(key, { tokens: tokens - 1, last: now });
await redis.expire(key, ttl);
next();
});
Return 429 Too Many Requests with Retry-After. For idempotent calls, clients should retry after the header delay.
Best practices
- Layer limits: Edge (CDN/WAF) + gateway + app-level for expensive ops.
- Expose headers:
X-RateLimit-Limit,-Remaining,-Resetfor developer UX. - Protect auth flows: Stricter limits on login, password reset, token minting.
- Adaptive limits: Tighten during incidents; loosen for trusted clients.
- Separate write vs. read: Writes usually get lower thresholds.
- Monitor and alert: Track 429s, latency, and cache hit ratios to tune values.
Common pitfalls
- Only per-IP: Breaks behind NATs and misses authenticated abuse. Use identity keys.
- Clock skew: Distributed windows can drift—prefer server-side counters (Redis/Lua).
- Retry storms: Clients hammering after 429. Add jitter and backoff guidance.
- Caching interaction: Cacheable responses reduce pressure—set explicit
Cache-Control.
Conclusion
Effective rate limiting blends the right algorithm (sliding or token bucket), correct keys (per user/app), and multi-layer enforcement. Do this well and you protect uptime, ensure fairness, and keep cloud costs predictable.
Tip: Start conservative, ship metrics, then tune limits per endpoint based on real traffic and error budgets.