Rate Limiting vs Throttling Explained: Protect APIs & Improve Performance | Engineering Blog | Neeraj Prajapati

Imagine you're running a popular restaurant. If you let everyone in at once without any control, your kitchen gets overwhelmed, service quality drops, and the whole operation collapses. The same thing happens with APIs and web services when too many requests flood in at once.

This is where Rate Limiting and Throttling come in—they're like the bouncers and queue managers for your digital services, ensuring smooth operations even under heavy traffic.

In this blog, we'll break down these concepts in simple terms, understand why they matter, and explore how to implement them effectively.

What is Rate Limiting?

Rate Limiting is a technique to control how many requests a user or client can make to your API within a specific time window.

Real-World Analogy

Think of an ATM machine that allows you to withdraw money only 5 times per day. If you try a 6th time, it simply declines your request. That's rate limiting—setting a hard cap on usage.

How It Works

code

User makes request → Check request count → 
If under limit: Process request
If over limit: Reject with 429 (Too Many Requests)

Common Rate Limiting Strategies

1. Fixed Window

Requests are counted within fixed time windows (e.g., per minute, per hour).

Example: 100 requests per hour, window resets at the top of each hour.

code

Time: 2:00 PM - 3:00 PM → 100 requests allowed
Time: 3:00 PM - 4:00 PM → Counter resets, 100 requests allowed again

Pros:

Simple to implement
Easy to understand

Cons:

Can allow burst traffic at window boundaries (e.g., 100 requests at 2:59 PM + 100 at 3:00 PM = 200 in 1 minute)

2. Sliding Window

A more sophisticated approach that uses a rolling time window.

Example: 100 requests in any 60-minute period.

code

At 2:30 PM, we check requests from 1:30 PM to 2:30 PM
At 2:31 PM, we check requests from 1:31 PM to 2:31 PM

Pros:

Prevents burst abuse at boundaries
More fair distribution

Cons:

Slightly more complex to implement
Requires more memory to track timestamps

3. Token Bucket

Imagine a bucket that holds tokens. Each request consumes a token. The bucket refills at a constant rate.

Example: Bucket capacity = 100 tokens, refill rate = 10 tokens/second

code

Start: 100 tokens available
User makes 50 requests: 50 tokens left
Wait 5 seconds: 50 + (10 × 5) = 100 tokens (capped at bucket size)

Pros:

Allows controlled bursts
Smooth handling of variable traffic
Industry standard

Cons:

More complex logic
Requires state management

4. Leaky Bucket

Similar to token bucket, but requests flow out at a constant rate, like water leaking from a bucket.

Example: Process 10 requests per second, queue overflow rejected

code

Incoming requests fill the bucket → 
Requests are processed at constant rate → 
If bucket overflows → Reject new requests

Pros:

Smooth, constant output rate
Good for downstream protection

Cons:

Can introduce latency (queuing)
Fixed processing rate

What is Throttling?

Throttling is about controlling the rate at which requests are processed, rather than simply rejecting them.

Real-World Analogy

Think of a highway toll plaza with multiple booths. When traffic increases, they open more booths, but there's still a maximum processing speed. Cars might slow down (throttle), but they eventually get through.

How It Differs from Rate Limiting

Aspect	Rate Limiting	Throttling
Action	Rejects excess requests	Slows down request processing
Response	HTTP 429 (Too Many Requests)	Delayed response or queuing
User Impact	Request fails immediately	Request succeeds but slower
Use Case	Prevent abuse, protect resources	Manage load, ensure quality of service

Types of Throttling

1. Request Throttling

Delay processing of requests to maintain a steady load.

javascript

// Pseudo-code
async function handleRequest(req) {
  if (currentLoad > threshold) {
    await sleep(calculateDelay());
  }
  return processRequest(req);
}

2. Bandwidth Throttling

Limit the data transfer rate for responses.

Example: Streaming video at 1 Mbps even if user's connection supports 10 Mbps.

3. Concurrent Request Throttling

Limit how many requests can be processed simultaneously.

Example: Allow max 5 concurrent requests per user, queue the rest.

Why Do We Need Rate Limiting and Throttling?

1. Prevent DDoS Attacks

Malicious users can flood your API with requests to bring down your service. Rate limiting acts as a first line of defense.

code

Normal user: 10 requests/minute ✅
Attacker: 10,000 requests/minute ❌ Blocked

2. Ensure Fair Resource Usage

Without limits, a single user could hog all resources, degrading service for everyone else.

code

User A: Using 90% of server capacity
Users B-Z: Experiencing slowdowns or failures

With rate limiting:

code

User A: Limited to 10% capacity
Users B-Z: Fair share, smooth experience

3. Protect Backend Services

Your API might call databases, third-party services, or microservices. Rate limiting prevents overwhelming these dependencies.

4. Cost Control

Many cloud services charge per request or per computation. Rate limiting helps control costs by preventing runaway usage.

5. Maintain Service Quality

Throttling ensures consistent response times even during traffic spikes, maintaining a good user experience.

Implementation Strategies

1. Application-Level Rate Limiting

Implement logic directly in your application code.

Example: Node.js with Express

javascript

const rateLimit = require('express-rate-limit');

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // Limit each IP to 100 requests per windowMs
  message: 'Too many requests from this IP, please try again later.',
  standardHeaders: true, // Return rate limit info in RateLimit-* headers
  legacyHeaders: false,
});

app.use('/api/', limiter);

Pros:

Fine-grained control
Custom logic per endpoint

Cons:

Adds overhead to application
Harder to scale across multiple servers

2. API Gateway Rate Limiting

Use an API Gateway (AWS API Gateway, Kong, NGINX) to handle rate limiting before requests reach your application.

Example: NGINX Configuration

nginx

http {
  limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
  
  server {
    location /api/ {
      limit_req zone=api_limit burst=20 nodelay;
      proxy_pass http://backend;
    }
  }
}

Pros:

Offloads work from application
Centralized control
Better performance

Cons:

Less flexible for complex logic
Additional infrastructure

3. Redis-Based Rate Limiting

Use Redis for distributed rate limiting across multiple servers.

Example: Redis Token Bucket

javascript

const redis = require('redis');
const client = redis.createClient();

async function isRateLimited(userId, maxRequests, windowSeconds) {
  const key = `rate_limit:${userId}`;
  const current = await client.incr(key);
  
  if (current === 1) {
    await client.expire(key, windowSeconds);
  }
  
  return current > maxRequests;
}

// Usage
app.post('/api/data', async (req, res) => {
  const userId = req.user.id;
  
  if (await isRateLimited(userId, 100, 3600)) {
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }
  
  // Process request
  res.json({ data: 'success' });
});

Pros:

Shared state across servers
Fast performance
Scales horizontally

Cons:

Requires Redis infrastructure
Additional complexity

4. Client-Side Throttling

Implement throttling in client applications to reduce unnecessary requests.

Example: JavaScript Debounce

javascript

function debounce(func, delay) {
  let timeoutId;
  return function(...args) {
    clearTimeout(timeoutId);
    timeoutId = setTimeout(() => func.apply(this, args), delay);
  };
}

// Usage: Search as user types
const searchAPI = debounce(async (query) => {
  const response = await fetch(`/api/search?q=${query}`);
  // Handle response
}, 300); // Wait 300ms after user stops typing

searchInput.addEventListener('input', (e) => {
  searchAPI(e.target.value);
});

Best Practices

1. Communicate Limits Clearly

Return rate limit information in response headers:

code

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 75
X-RateLimit-Reset: 1704038400

2. Use Appropriate HTTP Status Codes

429 Too Many Requests: Rate limit exceeded
503 Service Unavailable: Server overloaded (throttling)

3. Implement Graceful Degradation

Instead of hard rejections, consider:

Queuing non-critical requests
Returning cached data
Reducing response detail

4. Different Limits for Different Users

code

Free tier: 1,000 requests/day
Basic tier: 10,000 requests/day
Premium tier: 100,000 requests/day
Enterprise: Unlimited with throttling

5. Monitor and Alert

Track metrics like:

Rate limit hit rate
429 response count
Average request rate per user
Peak traffic patterns

6. Provide Retry-After Headers

Tell clients when they can retry:

code

HTTP/1.1 429 Too Many Requests
Retry-After: 3600
Content-Type: application/json

{
  "error": "Rate limit exceeded",
  "message": "Try again in 1 hour"
}

Common Pitfalls to Avoid

1. Rate Limiting by IP Only

Problem: Multiple users behind the same NAT/proxy share one IP.

Solution: Use authentication tokens or user IDs when possible.

2. Too Restrictive Limits

Problem: Legitimate users get blocked.

Solution: Analyze usage patterns and set realistic limits.

3. No Burst Allowance

Problem: Users can't handle sudden legitimate spikes.

Solution: Use token bucket or allow small bursts.

4. Ignoring Different Endpoints

Problem: Cheap operations limited same as expensive ones.

Solution: Implement per-endpoint rate limits:

javascript

app.get('/api/cheap', rateLimiter(1000)); // 1000 req/hour
app.post('/api/expensive', rateLimiter(100)); // 100 req/hour

5. Poor Error Messages

❌ Bad: "Error 429"

✅ Good: "Rate limit exceeded. You've made 101 requests in the last hour. Limit is 100/hour. Try again at 3:00 PM UTC."

Real-World Examples

1. GitHub API

code

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1372700873

Unauthenticated: 60 requests/hour
Authenticated: 5,000 requests/hour

2. Twitter API

Uses a combination of rate limiting and throttling:

Rate limits per 15-minute window
Different limits for different endpoints
Throttles during high platform load

3. Stripe API

Rate limiting by account (not just IP)
Allows bursts using token bucket
Provides detailed headers and documentation

Implementation Example: Complete System

Here's a production-ready rate limiting system:

javascript

const Redis = require('ioredis');
const redis = new Redis();

class RateLimiter {
  constructor(options = {}) {
    this.maxRequests = options.maxRequests || 100;
    this.windowSeconds = options.windowSeconds || 3600;
    this.blockDurationSeconds = options.blockDurationSeconds || 900;
  }

  async checkLimit(identifier) {
    const key = `rate_limit:${identifier}`;
    const blockKey = `rate_limit:blocked:${identifier}`;
    
    // Check if user is blocked
    const isBlocked = await redis.exists(blockKey);
    if (isBlocked) {
      const ttl = await redis.ttl(blockKey);
      return {
        allowed: false,
        blocked: true,
        retryAfter: ttl
      };
    }
    
    // Sliding window using sorted sets
    const now = Date.now();
    const windowStart = now - (this.windowSeconds * 1000);
    
    // Remove old entries
    await redis.zremrangebyscore(key, 0, windowStart);
    
    // Count requests in current window
    const requestCount = await redis.zcard(key);
    
    if (requestCount >= this.maxRequests) {
      // Block user for blockDurationSeconds
      await redis.setex(blockKey, this.blockDurationSeconds, '1');
      
      return {
        allowed: false,
        blocked: true,
        retryAfter: this.blockDurationSeconds,
        currentCount: requestCount,
        limit: this.maxRequests
      };
    }
    
    // Add current request
    await redis.zadd(key, now, `${now}-${Math.random()}`);
    await redis.expire(key, this.windowSeconds);
    
    return {
      allowed: true,
      blocked: false,
      currentCount: requestCount + 1,
      limit: this.maxRequests,
      remaining: this.maxRequests - requestCount - 1,
      resetAt: now + (this.windowSeconds * 1000)
    };
  }
}

// Express middleware
function createRateLimitMiddleware(limiter) {
  return async (req, res, next) => {
    const identifier = req.user?.id || req.ip;
    
    try {
      const result = await limiter.checkLimit(identifier);
      
      // Set headers
      res.set({
        'X-RateLimit-Limit': result.limit,
        'X-RateLimit-Remaining': result.remaining || 0,
        'X-RateLimit-Reset': result.resetAt || Date.now()
      });
      
      if (!result.allowed) {
        res.set('Retry-After', result.retryAfter);
        return res.status(429).json({
          error: 'Rate limit exceeded',
          message: result.blocked 
            ? `Too many requests. Blocked for ${result.retryAfter} seconds.`
            : `Rate limit of ${result.limit} requests per ${limiter.windowSeconds} seconds exceeded.`,
          retryAfter: result.retryAfter
        });
      }
      
      next();
    } catch (error) {
      console.error('Rate limiter error:', error);
      // Fail open - allow request if rate limiter fails
      next();
    }
  };
}

// Usage
const apiLimiter = new RateLimiter({
  maxRequests: 100,
  windowSeconds: 3600,
  blockDurationSeconds: 900
});

app.use('/api/', createRateLimitMiddleware(apiLimiter));

Testing Your Rate Limiting

Always test your implementation thoroughly:

javascript

// Load testing script
const axios = require('axios');

async function testRateLimit() {
  const results = {
    success: 0,
    rateLimited: 0,
    errors: 0
  };
  
  // Fire 150 requests (limit is 100)
  const promises = Array.from({ length: 150 }, async (_, i) => {
    try {
      const response = await axios.get('http://localhost:3000/api/data');
      results.success++;
      console.log(`Request ${i + 1}: Success`);
    } catch (error) {
      if (error.response?.status === 429) {
        results.rateLimited++;
        console.log(`Request ${i + 1}: Rate limited`);
      } else {
        results.errors++;
        console.log(`Request ${i + 1}: Error ${error.message}`);
      }
    }
  });
  
  await Promise.all(promises);
  
  console.log('\nTest Results:', results);
  console.log(`Expected ~100 success, ~50 rate limited`);
}

testRateLimit();

Conclusion

Rate limiting and throttling are essential tools in every backend developer's toolkit. They protect your services, ensure fair usage, and maintain quality of service under load.

Key Takeaways:

Rate Limiting = Setting hard caps on request counts
Throttling = Controlling the pace of processing
Use the right strategy for your use case (Fixed Window, Token Bucket, etc.)
Implement at the right layer (Application, Gateway, or both)
Communicate limits clearly to users
Monitor and adjust based on real usage patterns

Start with simple rate limiting (like fixed window), and evolve to more sophisticated approaches as your system scales. Your future self (and your servers) will thank you!

Additional Resources

Found this helpful? Follow me for more deep dives into backend engineering and system design!

Connect with me:

LinkedIn: Neeraj Prajapati
GitHub: neerajsde
Instagram: @neeraj.devx

Loading...

What is Rate Limiting?

Real-World Analogy

How It Works

Common Rate Limiting Strategies

1. Fixed Window

2. Sliding Window

3. Token Bucket

4. Leaky Bucket

What is Throttling?

Real-World Analogy

How It Differs from Rate Limiting

Types of Throttling

1. Request Throttling

2. Bandwidth Throttling

3. Concurrent Request Throttling

Why Do We Need Rate Limiting and Throttling?

1. Prevent DDoS Attacks

2. Ensure Fair Resource Usage

3. Protect Backend Services

4. Cost Control

5. Maintain Service Quality

Implementation Strategies

1. Application-Level Rate Limiting

2. API Gateway Rate Limiting

3. Redis-Based Rate Limiting

4. Client-Side Throttling

Best Practices

1. Communicate Limits Clearly

2. Use Appropriate HTTP Status Codes

3. Implement Graceful Degradation

4. Different Limits for Different Users

5. Monitor and Alert

6. Provide Retry-After Headers

Common Pitfalls to Avoid

1. Rate Limiting by IP Only

2. Too Restrictive Limits

3. No Burst Allowance

4. Ignoring Different Endpoints

5. Poor Error Messages

Real-World Examples

1. GitHub API

2. Twitter API

3. Stripe API

Implementation Example: Complete System

Testing Your Rate Limiting

Conclusion

Additional Resources