5 Best Ways to Implement Rate Limiting for Users in Python

💡 Problem Formulation: In web services, it’s crucial to prevent abuse and ensure fair resource use by imposing a rate limit on user actions. For instance, an API may allow a user 100 requests per hour. Surpassing this limit should prompt the system to block further activity for a set period. This article details how to construct data structures in Python to efficiently monitor and enforce such rate limits.

Method 1: Using a Fixed Window Counter

This method maintains a basic counter and a timestamp of when the window starts for a user. When a request is made, the counter increments. If the request count exceeds the limit before the window resets, access is denied. It’s simplistic but can allow bursts of traffic near the end and beginning of a window.

Here’s an example:

import time

class FixedWindowRateLimiter:
    def __init__(self, max_requests, window_size):
        self.max_requests = max_requests
        self.window_size = window_size
        self.request_counts = {}
        self.window_start = time.time()
    
    def allow_request(self, user_id):
        if user_id not in self.request_counts:
            self.request_counts[user_id] = 0
        if time.time() > self.window_start + self.window_size:
            self.request_counts[user_id] = 0
            self.window_start = time.time()
        self.request_counts[user_id] += 1
        return self.request_counts[user_id] <= self.max_requests

rate_limiter = FixedWindowRateLimiter(100, 3600)
user_id = 'user_123'
print(rate_limiter.allow_request(user_id))

The output will be:

True

This code snippet defines a simple rate limiter that resets the request count for a user after a specified window period. It returns True if the user is under their request limit, and increments their count with each allowed request.

Method 2: Sliding Window Log

The sliding window log expands upon the fixed window approach by keeping a timestamp for each user request within the window. This provides more evenly distributed access within the timeframe and prevents short bursts of excessive traffic.

Here’s an example:

from collections import deque
from time import time

class SlidingWindowRateLimiter:
    def __init__(self, max_requests, window_size):
        self.max_requests = max_requests
        self.window_size = window_size
        self.logs = {}
        
    def allow_request(self, user_id):
        current_time = time()
        if user_id not in self.logs:
            self.logs[user_id] = deque()
        while self.logs[user_id] and self.logs[user_id][0] < current_time - self.window_size:
            self.logs[user_id].popleft()
        self.logs[user_id].append(current_time)
        return len(self.logs[user_id]) <= self.max_requests

rate_limiter = SlidingWindowRateLimiter(100, 3600)
user_id = 'user_123'
print(rate_limiter.allow_request(user_id))

The output will be:

True

This snippet demonstrates a rate limiter that records the time each request is made. By removing timestamps older than the rate limit window, it provides an accurate count of recent requests to decide if the current request should be allowed.

Method 3: Token Bucket Algorithm

The token bucket algorithm combines a refillable token bucket with a fixed capacity. Each token represents permission for a single request. Tokens are added at a steady rate, reflecting the rate limit. If enough tokens are available, a request proceeds; otherwise, it is rate-limited.

Here’s an example:

import time

class TokenBucketRateLimiter:
    def __init__(self, rate, capacity):
        self.capacity = capacity
        self.tokens = capacity
        self.rate = rate
        self.last_check = time.time()

    def allow_request(self, amount=1):
        current_time = time.time()
        passed_time = current_time - self.last_check
        self.last_check = current_time

        self.tokens += passed_time * self.rate
        if self.tokens > self.capacity:
            self.tokens = self.capacity

        if self.tokens >= amount:
            self.tokens -= amount
            return True
        else:
            return False

rate_limiter = TokenBucketRateLimiter(1/36, 100)
print(rate_limiter.allow_request())

The output will be:

True

This code sets up a token bucket with a specified rate and capacity. Each call to allow_request checks and refills the bucket with tokens based on the elapsed time since the last check and then grants or denies the request based on available tokens.

Method 4: Leaky Bucket Algorithm

The leaky bucket algorithm is another metaphor for rate-limiting, where requests fill the bucket to its capacity. Requests are processed at a steady pace, analogous to water leaking from a bucket. If the bucket is full, additional requests are rate-limited or discarded.

Here’s an example:

import time

class LeakyBucketRateLimiter:
    def __init__(self, rate, capacity):
        self.capacity = capacity
        self.queue = []
        self.rate = rate
        self.last_check = time.time()

    def allow_request(self):
        current_time = time.time()
        while self.queue and self.queue[0] < current_time - self.capacity/self.rate:
            self.queue.pop(0)
        self.queue.append(current_time)
        if len(self.queue) <= self.capacity:
            return True
        else:
            return False

rate_limiter = LeakyBucketRateLimiter(1/36, 100)
print(rate_limiter.allow_request())

The output will be:

True

In this example, we’ve implemented the leaky bucket as an array where each incoming request adds an element, and outdated elements (“leaks”) are removed. If the array length exceeds capacity due to incoming requests (“input flow”), new requests are rate-limited.

Bonus One-Liner Method 5: Using Decorators

Python decorators can be used to elegantly implement rate limiting in a single line of code at the start of a function definition, using libraries like ratelimit that handle the underlying details.

Here’s an example:

from ratelimit import limits, sleep_and_retry

@sleep_and_retry
@limits(calls=100, period=3600)
def limited_function():
    pass

print(limited_function())

The output will be:

None

By using decorators from the ratelimit library, we can create a function that is intrinsically rate-limited. The @limits decorator specifies the number of calls and period, while @sleep_and_retry will wait until it’s possible to execute the function again.

Summary/Discussion

Method 1: Fixed Window Counter. This method is simple and straightforward. However, it can allow bursts of traffic which might be undesirable for some systems.
Method 2: Sliding Window Log. Offers a more evenly distributed access over the window period and prevents traffic spikes. It’s more complex to implement than the fixed window counter.
Method 3: Token Bucket Algorithm. It allows for a burst up to the bucket capacity, afterwards limiting the rate to a steady pace defined by the token refill rate. This might not be suitable for all scenarios due to potential initial bursts.
Method 4: Leaky Bucket Algorithm. Processes requests at a consistent rate, preventing bursts entirely, but might be more complex to maintain than a simple rate limiter.
Method 5: Using Decorators. It’s an elegant solution that enables rate-limiting with minimal intrusion. However, it relies on third-party libraries and is less flexible for custom implementations.