Rate Limiting vs Throttling Explained: Protect APIs & Improve Performance

Imagine you're running a popular restaurant. If you let everyone in at once without any control, your kitchen gets overwhelmed, service quality drops, and the whole operation collapses. The same thing happens with APIs and web services when too many requests flood in at once.
This is where Rate Limiting and Throttling come in—they're like the bouncers and queue managers for your digital services, ensuring smooth operations even under heavy traffic.
In this blog, we'll break down these concepts in simple terms, understand why they matter, and explore how to implement them effectively.
What is Rate Limiting?
Rate Limiting is a technique to control how many requests a user or client can make to your API within a specific time window.
Real-World Analogy
Think of an ATM machine that allows you to withdraw money only 5 times per day. If you try a 6th time, it simply declines your request. That's rate limiting—setting a hard cap on usage.
How It Works
User makes request → Check request count →
If under limit: Process request
If over limit: Reject with 429 (Too Many Requests)Common Rate Limiting Strategies
1. Fixed Window
Requests are counted within fixed time windows (e.g., per minute, per hour).
Example: 100 requests per hour, window resets at the top of each hour.
Time: 2:00 PM - 3:00 PM → 100 requests allowed
Time: 3:00 PM - 4:00 PM → Counter resets, 100 requests allowed againPros:
- Simple to implement
- Easy to understand
Cons:
- Can allow burst traffic at window boundaries (e.g., 100 requests at 2:59 PM + 100 at 3:00 PM = 200 in 1 minute)
2. Sliding Window
A more sophisticated approach that uses a rolling time window.
Example: 100 requests in any 60-minute period.
At 2:30 PM, we check requests from 1:30 PM to 2:30 PM
At 2:31 PM, we check requests from 1:31 PM to 2:31 PMPros:
- Prevents burst abuse at boundaries
- More fair distribution
Cons:
- Slightly more complex to implement
- Requires more memory to track timestamps
3. Token Bucket
Imagine a bucket that holds tokens. Each request consumes a token. The bucket refills at a constant rate.
Example: Bucket capacity = 100 tokens, refill rate = 10 tokens/second
Start: 100 tokens available
User makes 50 requests: 50 tokens left
Wait 5 seconds: 50 + (10 × 5) = 100 tokens (capped at bucket size)Pros:
- Allows controlled bursts
- Smooth handling of variable traffic
- Industry standard
Cons:
- More complex logic
- Requires state management
4. Leaky Bucket
Similar to token bucket, but requests flow out at a constant rate, like water leaking from a bucket.
Example: Process 10 requests per second, queue overflow rejected
Incoming requests fill the bucket →
Requests are processed at constant rate →
If bucket overflows → Reject new requestsPros:
- Smooth, constant output rate
- Good for downstream protection
Cons:
- Can introduce latency (queuing)
- Fixed processing rate
What is Throttling?
Throttling is about controlling the rate at which requests are processed, rather than simply rejecting them.
Real-World Analogy
Think of a highway toll plaza with multiple booths. When traffic increases, they open more booths, but there's still a maximum processing speed. Cars might slow down (throttle), but they eventually get through.
How It Differs from Rate Limiting
| Aspect | Rate Limiting | Throttling |
|---|---|---|
| Action | Rejects excess requests | Slows down request processing |
| Response | HTTP 429 (Too Many Requests) | Delayed response or queuing |
| User Impact | Request fails immediately | Request succeeds but slower |
| Use Case | Prevent abuse, protect resources | Manage load, ensure quality of service |
Types of Throttling
1. Request Throttling
Delay processing of requests to maintain a steady load.
// Pseudo-code
async function handleRequest(req) {
if (currentLoad > threshold) {
await sleep(calculateDelay());
}
return processRequest(req);
}2. Bandwidth Throttling
Limit the data transfer rate for responses.
Example: Streaming video at 1 Mbps even if user's connection supports 10 Mbps.
3. Concurrent Request Throttling
Limit how many requests can be processed simultaneously.
Example: Allow max 5 concurrent requests per user, queue the rest.
Why Do We Need Rate Limiting and Throttling?
1. Prevent DDoS Attacks
Malicious users can flood your API with requests to bring down your service. Rate limiting acts as a first line of defense.
Normal user: 10 requests/minute ✅
Attacker: 10,000 requests/minute ❌ Blocked2. Ensure Fair Resource Usage
Without limits, a single user could hog all resources, degrading service for everyone else.
User A: Using 90% of server capacity
Users B-Z: Experiencing slowdowns or failuresWith rate limiting:
User A: Limited to 10% capacity
Users B-Z: Fair share, smooth experience3. Protect Backend Services
Your API might call databases, third-party services, or microservices. Rate limiting prevents overwhelming these dependencies.
4. Cost Control
Many cloud services charge per request or per computation. Rate limiting helps control costs by preventing runaway usage.
5. Maintain Service Quality
Throttling ensures consistent response times even during traffic spikes, maintaining a good user experience.
Implementation Strategies
1. Application-Level Rate Limiting
Implement logic directly in your application code.
Example: Node.js with Express
const rateLimit = require('express-rate-limit');
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // Limit each IP to 100 requests per windowMs
message: 'Too many requests from this IP, please try again later.',
standardHeaders: true, // Return rate limit info in RateLimit-* headers
legacyHeaders: false,
});
app.use('/api/', limiter);Pros:
- Fine-grained control
- Custom logic per endpoint
Cons:
- Adds overhead to application
- Harder to scale across multiple servers
2. API Gateway Rate Limiting
Use an API Gateway (AWS API Gateway, Kong, NGINX) to handle rate limiting before requests reach your application.
Example: NGINX Configuration
http {
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://backend;
}
}
}Pros:
- Offloads work from application
- Centralized control
- Better performance
Cons:
- Less flexible for complex logic
- Additional infrastructure
3. Redis-Based Rate Limiting
Use Redis for distributed rate limiting across multiple servers.
Example: Redis Token Bucket
const redis = require('redis');
const client = redis.createClient();
async function isRateLimited(userId, maxRequests, windowSeconds) {
const key = `rate_limit:${userId}`;
const current = await client.incr(key);
if (current === 1) {
await client.expire(key, windowSeconds);
}
return current > maxRequests;
}
// Usage
app.post('/api/data', async (req, res) => {
const userId = req.user.id;
if (await isRateLimited(userId, 100, 3600)) {
return res.status(429).json({ error: 'Rate limit exceeded' });
}
// Process request
res.json({ data: 'success' });
});Pros:
- Shared state across servers
- Fast performance
- Scales horizontally
Cons:
- Requires Redis infrastructure
- Additional complexity
4. Client-Side Throttling
Implement throttling in client applications to reduce unnecessary requests.
Example: JavaScript Debounce
function debounce(func, delay) {
let timeoutId;
return function(...args) {
clearTimeout(timeoutId);
timeoutId = setTimeout(() => func.apply(this, args), delay);
};
}
// Usage: Search as user types
const searchAPI = debounce(async (query) => {
const response = await fetch(`/api/search?q=${query}`);
// Handle response
}, 300); // Wait 300ms after user stops typing
searchInput.addEventListener('input', (e) => {
searchAPI(e.target.value);
});Best Practices
1. Communicate Limits Clearly
Return rate limit information in response headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 75
X-RateLimit-Reset: 17040384002. Use Appropriate HTTP Status Codes
- 429 Too Many Requests: Rate limit exceeded
- 503 Service Unavailable: Server overloaded (throttling)
3. Implement Graceful Degradation
Instead of hard rejections, consider:
- Queuing non-critical requests
- Returning cached data
- Reducing response detail
4. Different Limits for Different Users
Free tier: 1,000 requests/day
Basic tier: 10,000 requests/day
Premium tier: 100,000 requests/day
Enterprise: Unlimited with throttling5. Monitor and Alert
Track metrics like:
- Rate limit hit rate
- 429 response count
- Average request rate per user
- Peak traffic patterns
6. Provide Retry-After Headers
Tell clients when they can retry:
HTTP/1.1 429 Too Many Requests
Retry-After: 3600
Content-Type: application/json
{
"error": "Rate limit exceeded",
"message": "Try again in 1 hour"
}Common Pitfalls to Avoid
1. Rate Limiting by IP Only
Problem: Multiple users behind the same NAT/proxy share one IP.
Solution: Use authentication tokens or user IDs when possible.
2. Too Restrictive Limits
Problem: Legitimate users get blocked.
Solution: Analyze usage patterns and set realistic limits.
3. No Burst Allowance
Problem: Users can't handle sudden legitimate spikes.
Solution: Use token bucket or allow small bursts.
4. Ignoring Different Endpoints
Problem: Cheap operations limited same as expensive ones.
Solution: Implement per-endpoint rate limits:
app.get('/api/cheap', rateLimiter(1000)); // 1000 req/hour
app.post('/api/expensive', rateLimiter(100)); // 100 req/hour5. Poor Error Messages
❌ Bad: "Error 429"
✅ Good: "Rate limit exceeded. You've made 101 requests in the last hour. Limit is 100/hour. Try again at 3:00 PM UTC."
Real-World Examples
1. GitHub API
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1372700873- Unauthenticated: 60 requests/hour
- Authenticated: 5,000 requests/hour
2. Twitter API
Uses a combination of rate limiting and throttling:
- Rate limits per 15-minute window
- Different limits for different endpoints
- Throttles during high platform load
3. Stripe API
- Rate limiting by account (not just IP)
- Allows bursts using token bucket
- Provides detailed headers and documentation
Implementation Example: Complete System
Here's a production-ready rate limiting system:
const Redis = require('ioredis');
const redis = new Redis();
class RateLimiter {
constructor(options = {}) {
this.maxRequests = options.maxRequests || 100;
this.windowSeconds = options.windowSeconds || 3600;
this.blockDurationSeconds = options.blockDurationSeconds || 900;
}
async checkLimit(identifier) {
const key = `rate_limit:${identifier}`;
const blockKey = `rate_limit:blocked:${identifier}`;
// Check if user is blocked
const isBlocked = await redis.exists(blockKey);
if (isBlocked) {
const ttl = await redis.ttl(blockKey);
return {
allowed: false,
blocked: true,
retryAfter: ttl
};
}
// Sliding window using sorted sets
const now = Date.now();
const windowStart = now - (this.windowSeconds * 1000);
// Remove old entries
await redis.zremrangebyscore(key, 0, windowStart);
// Count requests in current window
const requestCount = await redis.zcard(key);
if (requestCount >= this.maxRequests) {
// Block user for blockDurationSeconds
await redis.setex(blockKey, this.blockDurationSeconds, '1');
return {
allowed: false,
blocked: true,
retryAfter: this.blockDurationSeconds,
currentCount: requestCount,
limit: this.maxRequests
};
}
// Add current request
await redis.zadd(key, now, `${now}-${Math.random()}`);
await redis.expire(key, this.windowSeconds);
return {
allowed: true,
blocked: false,
currentCount: requestCount + 1,
limit: this.maxRequests,
remaining: this.maxRequests - requestCount - 1,
resetAt: now + (this.windowSeconds * 1000)
};
}
}
// Express middleware
function createRateLimitMiddleware(limiter) {
return async (req, res, next) => {
const identifier = req.user?.id || req.ip;
try {
const result = await limiter.checkLimit(identifier);
// Set headers
res.set({
'X-RateLimit-Limit': result.limit,
'X-RateLimit-Remaining': result.remaining || 0,
'X-RateLimit-Reset': result.resetAt || Date.now()
});
if (!result.allowed) {
res.set('Retry-After', result.retryAfter);
return res.status(429).json({
error: 'Rate limit exceeded',
message: result.blocked
? `Too many requests. Blocked for ${result.retryAfter} seconds.`
: `Rate limit of ${result.limit} requests per ${limiter.windowSeconds} seconds exceeded.`,
retryAfter: result.retryAfter
});
}
next();
} catch (error) {
console.error('Rate limiter error:', error);
// Fail open - allow request if rate limiter fails
next();
}
};
}
// Usage
const apiLimiter = new RateLimiter({
maxRequests: 100,
windowSeconds: 3600,
blockDurationSeconds: 900
});
app.use('/api/', createRateLimitMiddleware(apiLimiter));Testing Your Rate Limiting
Always test your implementation thoroughly:
// Load testing script
const axios = require('axios');
async function testRateLimit() {
const results = {
success: 0,
rateLimited: 0,
errors: 0
};
// Fire 150 requests (limit is 100)
const promises = Array.from({ length: 150 }, async (_, i) => {
try {
const response = await axios.get('http://localhost:3000/api/data');
results.success++;
console.log(`Request ${i + 1}: Success`);
} catch (error) {
if (error.response?.status === 429) {
results.rateLimited++;
console.log(`Request ${i + 1}: Rate limited`);
} else {
results.errors++;
console.log(`Request ${i + 1}: Error ${error.message}`);
}
}
});
await Promise.all(promises);
console.log('\nTest Results:', results);
console.log(`Expected ~100 success, ~50 rate limited`);
}
testRateLimit();Conclusion
Rate limiting and throttling are essential tools in every backend developer's toolkit. They protect your services, ensure fair usage, and maintain quality of service under load.
Key Takeaways:
- Rate Limiting = Setting hard caps on request counts
- Throttling = Controlling the pace of processing
- Use the right strategy for your use case (Fixed Window, Token Bucket, etc.)
- Implement at the right layer (Application, Gateway, or both)
- Communicate limits clearly to users
- Monitor and adjust based on real usage patterns
Start with simple rate limiting (like fixed window), and evolve to more sophisticated approaches as your system scales. Your future self (and your servers) will thank you!
Additional Resources
- IETF Rate Limiting Standards
- Redis Rate Limiting Patterns
- AWS API Gateway Throttling
- Kong Rate Limiting Plugin
Found this helpful? Follow me for more deep dives into backend engineering and system design!
Connect with me:
- LinkedIn: Neeraj Prajapati
- GitHub: neerajsde
- Instagram: @neeraj.devx