Implementing Request Throttling Middleware in ASP.NET Core Using MemoryCache and Per-IP Limits

Q: How do you handle real client IP extraction behind proxies or load balancers?

Check headers like X-Forwarded-For and X-Real-IP to extract the true client IP. Fallback to context.Connection.RemoteIpAddress if headers are missing. Example: var forwardedFor = context.Request.Headers["X-Forwarded-For"].FirstOrDefault();

Q: What headers should you return to help clients handle rate limits?

Return X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After headers. These inform clients of their quota and when they can retry, enabling better client-side handling.

TL;DR

Implement request throttling middleware to protect ASP.NET Core APIs from abuse and DoS attacks.
Use IMemoryCache for per-IP rate limiting in single-instance deployments; use distributed cache for multi-instance.
Apply different rate limits for anonymous, authenticated, and API key users.
Extract real client IPs using headers for accurate tracking behind proxies.
Return rate limit headers to help clients manage their usage.
Monitor, log, and tune rate limits based on real-world traffic and patterns.

Your API is getting hammered. Requests are pouring in faster than your server can handle, and legitimate users are getting timeouts while someone’s bot is making thousands of calls per second.

Sound familiar? You need request throttling, and you need it now.

Request throttling (also called rate limiting) protects your application from abuse while ensuring fair access for all users.

Instead of reaching for a third-party package, let’s build a custom middleware solution using ASP.NET Core’s built-in IMemoryCache that gives you complete control over your throttling logic.

Why Request Throttling Matters

Every public API faces the same challenge: how do you balance accessibility with protection? Without throttling, a single misbehaving client can consume all your server resources, effectively creating a denial-of-service attack that impacts everyone else.

Throttling solves several problems:

Prevents abuse from automated scripts and bots
Protects against accidental infinite loops in client code
Ensures fair resource distribution among users
Reduces server load and improves overall performance
Helps you stay within third-party API rate limits when your app makes external calls

The key is implementing throttling that’s tough on abusers but invisible to normal users.

Real-World Scenario: API Under Attack

Consider an e-commerce API that exposes product search endpoints. During a flash sale, traffic spikes dramatically.

Most users make a few searches per minute, but you notice some IP addresses making hundreds of requests per second, clearly automated scrapers trying to harvest your entire product catalog.

Without throttling, these scrapers consume most of your server’s CPU and memory, causing legitimate customers to experience slow response times or timeouts. Your conversion rates drop, and customer satisfaction plummets.

With proper per-IP throttling in place, you can limit each IP address to a reasonable number of requests (say, 100 per minute), allowing normal users to browse freely while blocking the scrapers.

Understanding Rate Limiting Strategies

Before we jump into code, let’s compare the most common rate limiting approaches:

Strategy	Description	Pros	Cons	Best For
Fixed Window	Count requests in fixed time intervals (e.g., per minute)	Simple to implement, predictable memory usage	Allows bursts at window boundaries	Basic protection, simple APIs
Sliding Window	Track requests in a rolling time window	Smoother rate limiting, prevents boundary bursts	More complex, higher memory usage	Production APIs, user-facing services
Token Bucket	Allow bursts up to a limit, then enforce steady rate	Handles natural usage patterns well	Complex to implement correctly	APIs with bursty but legitimate traffic

For our middleware, we’ll implement a fixed window approach that’s simple yet effective for most scenarios.

Request Flow Through Throttling Middleware

Here’s how requests flow through our throttling system:

Diagram showing the request flow through ASP.NET Core throttling middleware, including extracting client IP, checking cache, incrementing counters, and returning 429 for rate-limited requests — Request Flow Through ASP.NET Core Throttling Middleware: How Per-IP Rate Limiting Works

Building the Throttling Middleware

Let’s create a robust throttling middleware that handles the complexities of real-world scenarios:

using Microsoft.Extensions.Caching.Memory;
using System.Net;

public class RequestThrottlingMiddleware
{
    private readonly RequestDelegate _next;
    private readonly IMemoryCache _cache;
    private readonly ILogger<RequestThrottlingMiddleware> _logger;
    private readonly RequestThrottlingOptions _options;

    public RequestThrottlingMiddleware(
        RequestDelegate next,
        IMemoryCache cache,
        ILogger<RequestThrottlingMiddleware> logger,
        RequestThrottlingOptions options)
    {
        _next = next;
        _cache = cache;
        _logger = logger;
        _options = options;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        var clientIp = GetClientIpAddress(context);
        var cacheKey = $"throttle_{clientIp}";
        
        // Determine rate limit based on user authentication status
        var rateLimit = GetRateLimitForRequest(context);
        
        if (!_cache.TryGetValue(cacheKey, out RequestCounter counter))
        {
            counter = new RequestCounter();
            var cacheOptions = new MemoryCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = _options.WindowSize,
                Priority = CacheItemPriority.Low
            };
            _cache.Set(cacheKey, counter, cacheOptions);
        }

        // Check if request should be throttled
        if (counter.RequestCount >= rateLimit.MaxRequests)
        {
            _logger.LogWarning("Request throttled for IP {ClientIp}. " +
                "Request count: {RequestCount}, Limit: {MaxRequests}",
                clientIp, counter.RequestCount, rateLimit.MaxRequests);

            context.Response.StatusCode = 429;
            context.Response.Headers.Add("Retry-After", 
                _options.WindowSize.TotalSeconds.ToString());
            
            await context.Response.WriteAsync(
                "Too many requests. Please try again later.");
            return;
        }

        // Increment request counter
        Interlocked.Increment(ref counter.RequestCount);

        // Add rate limiting headers for client visibility
        context.Response.Headers.Add("X-RateLimit-Limit", 
            rateLimit.MaxRequests.ToString());
        context.Response.Headers.Add("X-RateLimit-Remaining", 
            Math.Max(0, rateLimit.MaxRequests - counter.RequestCount).ToString());

        await _next(context);
    }

    private string GetClientIpAddress(HttpContext context)
    {
        // Handle X-Forwarded-For header for clients behind proxies/load balancers
        var forwardedFor = context.Request.Headers["X-Forwarded-For"].FirstOrDefault();
        if (!string.IsNullOrEmpty(forwardedFor))
        {
            var ips = forwardedFor.Split(',', StringSplitOptions.RemoveEmptyEntries);
            if (ips.Length > 0 && IPAddress.TryParse(ips[0].Trim(), out _))
            {
                return ips[0].Trim();
            }
        }

        // Handle X-Real-IP header (common with Nginx)
        var realIp = context.Request.Headers["X-Real-IP"].FirstOrDefault();
        if (!string.IsNullOrEmpty(realIp) && IPAddress.TryParse(realIp, out _))
        {
            return realIp;
        }

        // Fall back to connection remote IP
        return context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
    }

    private RateLimit GetRateLimitForRequest(HttpContext context)
    {
        // Authenticated users get higher limits
        if (context.User.Identity?.IsAuthenticated == true)
        {
            return _options.AuthenticatedUserLimit;
        }

        // Check for API key in headers
        if (context.Request.Headers.ContainsKey("X-API-Key"))
        {
            return _options.ApiKeyUserLimit;
        }

        // Anonymous users get basic limits
        return _options.AnonymousUserLimit;
    }
}

public class RequestCounter
{
    public int RequestCount { get; set; }
}

public class RequestThrottlingOptions
{
    public TimeSpan WindowSize { get; set; } = TimeSpan.FromMinutes(1);
    public RateLimit AnonymousUserLimit { get; set; } = new() { MaxRequests = 60 };
    public RateLimit AuthenticatedUserLimit { get; set; } = new() { MaxRequests = 300 };
    public RateLimit ApiKeyUserLimit { get; set; } = new() { MaxRequests = 1000 };
}

public class RateLimit
{
    public int MaxRequests { get; set; }
}

Registering the Middleware

Add the middleware to your application pipeline in Program.cs. Order matters, place it early in the pipeline but after any authentication middleware:

var builder = WebApplication.CreateBuilder(args);

// Add services
builder.Services.AddMemoryCache();
builder.Services.Configure<RequestThrottlingOptions>(options =>
{
    options.WindowSize = TimeSpan.FromMinutes(1);
    options.AnonymousUserLimit = new RateLimit { MaxRequests = 100 };
    options.AuthenticatedUserLimit = new RateLimit { MaxRequests = 500 };
    options.ApiKeyUserLimit = new RateLimit { MaxRequests = 2000 };
});

var app = builder.Build();

// Configure pipeline - order is important!
app.UseAuthentication(); // Must come before throttling to identify users
app.UseMiddleware<RequestThrottlingMiddleware>();
app.UseAuthorization();

app.MapControllers();
app.Run();

How the Middleware Works

The middleware operates on a simple but effective principle: track request counts per IP address within sliding time windows.

IP Address Extraction: The GetClientIpAddress method handles the complexity of identifying the real client IP, even when your application sits behind load balancers or reverse proxies. It checks X-Forwarded-For and X-Real-IP headers before falling back to the connection’s remote IP address.

Cache Management: Each IP address gets its own cache entry containing a request counter. The cache entry expires after the configured window size (default: 1 minute), automatically resetting the counter. This creates a fixed-window rate limiting system.

Differentiated Limits: The middleware applies different rate limits based on the request context. Authenticated users get higher limits than anonymous users, and API key holders get the highest limits. This approach encourages registration while still allowing reasonable anonymous access.

Thread Safety: The middleware uses Interlocked.Increment to safely update request counters in multi-threaded scenarios, preventing race conditions that could allow requests to slip through.

Informative Headers: Rate limiting headers (X-RateLimit-Limit, X-RateLimit-Remaining) help clients understand their current status and implement intelligent retry logic.

Production Considerations

Memory Management

IMemoryCache stores data in your application’s memory, which means rate limiting data is lost when the application restarts. For most scenarios, this is acceptable since the cache expires quickly anyway. However, monitor memory usage in high-traffic applications.

Consider setting cache size limits:

builder.Services.AddMemoryCache(options =>
{
    options.SizeLimit = 10000; // Limit number of cache entries
});

Distributed Cache for Scale

If you’re running multiple application instances, IMemoryCache won’t share data between them. Each instance will have its own rate limiting counters, effectively multiplying your rate limits by the number of instances.

For true distributed rate limiting, swap IMemoryCache for IDistributedCache:

public class DistributedRequestThrottlingMiddleware
{
    private readonly IDistributedCache _cache;
    
    public async Task InvokeAsync(HttpContext context)
    {
        var clientIp = GetClientIpAddress(context);
        var cacheKey = $"throttle_{clientIp}";
        
        var counterJson = await _cache.GetStringAsync(cacheKey);
        var counter = string.IsNullOrEmpty(counterJson) 
            ? new RequestCounter() 
            : JsonSerializer.Deserialize<RequestCounter>(counterJson);
        
        // ... rest of throttling logic
        
        await _cache.SetStringAsync(cacheKey, 
            JsonSerializer.Serialize(counter),
            new DistributedCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = _options.WindowSize
            });
    }
}

Performance Optimization

The middleware adds minimal overhead to each request, but you can optimize further:

Skip Static Files: Don’t throttle requests for CSS, JS, images, or other static content:

public async Task InvokeAsync(HttpContext context)
{
    // Skip throttling for static files
    if (context.Request.Path.StartsWithSegments("/static") ||
        context.Request.Path.Value?.Contains('.') == true)
    {
        await _next(context);
        return;
    }
    
    // ... throttling logic
}

Whitelist Internal IPs: Skip throttling for requests from your own infrastructure:

private bool IsWhitelistedIp(string ipAddress)
{
    var whitelistedRanges = new[]
    {
        "127.0.0.0/8",      // Localhost
        "10.0.0.0/8",       // Private network
        "172.16.0.0/12",    // Private network
        "192.168.0.0/16"    // Private network
    };
    
    // Implementation to check if IP is in whitelisted ranges
    // ... 
}

Monitoring and Alerting

Add logging and metrics to understand your throttling patterns:

private void LogThrottlingMetrics(string clientIp, int requestCount, int limit)
{
    _logger.LogInformation("Throttling stats - IP: {ClientIp}, " +
        "Requests: {RequestCount}/{Limit}", 
        clientIp, requestCount, limit);
    
    // Send metrics to your monitoring system
    // Example: Application Insights, DataDog, etc.
}

Watch for patterns that might indicate:

Legitimate users hitting limits (consider increasing thresholds)
Coordinated attacks from multiple IPs
Unusual traffic spikes that need investigation

Best Practices

Start Conservative: Begin with generous rate limits and tighten them based on actual usage patterns. It’s easier to reduce limits than to deal with angry users who were incorrectly blocked.

Provide Clear Error Messages: When throttling requests, return helpful error messages that explain the limit and when users can try again. The Retry-After header tells clients exactly when to retry.

Consider Business Logic: Different endpoints might need different limits. A search endpoint can handle more requests than a resource-intensive report generation endpoint.

Plan for Legitimate Bursts: Real users sometimes generate legitimate bursts of requests. Consider implementing a token bucket algorithm if your application needs to handle these gracefully.

Test Your Limits: Use load testing tools to verify your throttling works correctly under stress and doesn’t have unintended side effects.

Consistance response format: Ensure your throttling responses are consistent across all endpoints. This helps clients implement retry logic uniformly.

Wrapping Up

Request throttling is essential for any production API, and building your own middleware gives you complete control over the throttling logic. The IMemoryCache-based approach we’ve built handles the most common scenarios while remaining simple and performant.

This middleware protects your application from abuse while maintaining excellent performance for legitimate users. The differentiated rate limiting ensures authenticated users get better service than anonymous visitors, encouraging registration without blocking reasonable anonymous usage.

Remember to monitor your throttling metrics and adjust limits based on real-world usage patterns. What starts as protection against obvious abuse can evolve into a sophisticated system that optimizes resource allocation across your entire user base.

The beauty of custom middleware is that you can extend it as your needs grow, add more sophisticated algorithms, integrate with external rate limiting services, or implement complex business rules that off-the-shelf solutions can’t handle.

Frequently Asked Questions

What is request throttling middleware in ASP.NET Core and why is it important?

Request throttling middleware limits the number of requests a client can make in a given time window, protecting APIs from abuse, denial-of-service attacks, and resource exhaustion. It ensures fair usage and improves reliability for all users.

How does per-IP rate limiting work in ASP.NET Core middleware?

Per-IP rate limiting tracks the number of requests from each client IP address within a fixed window (e.g., 1 minute). If the request count exceeds the configured limit, the middleware returns a 429 status code and a Retry-After header. Example:

if (counter.RequestCount >= rateLimit.MaxRequests) {
    context.Response.StatusCode = 429;
    context.Response.Headers.Add("Retry-After", ...);
    await context.Response.WriteAsync("Too many requests.");
    return;
}

What are the main strategies for rate limiting in APIs?

Common strategies include fixed window, sliding window, and token bucket. Fixed window is simple and predictable, sliding window smooths out bursts, and token bucket allows short bursts while enforcing a steady rate.

How do you implement request throttling using IMemoryCache in ASP.NET Core?

Use IMemoryCache to store a request counter per IP with an expiration matching the rate limit window. Increment the counter on each request and reset it when the window expires. Example:

if (!_cache.TryGetValue(cacheKey, out RequestCounter counter)) {
    counter = new RequestCounter();
    _cache.Set(cacheKey, counter, cacheOptions);
}
Interlocked.Increment(ref counter.RequestCount);

How can you differentiate rate limits for anonymous, authenticated, and API key users?

Check the request context for authentication or API key headers and apply different rate limits accordingly. This allows higher limits for trusted users and stricter limits for anonymous traffic.

What are the limitations of using IMemoryCache for rate limiting in distributed environments?

IMemoryCache is local to each application instance, so rate limits are not shared across multiple servers. For distributed scenarios, use IDistributedCache or an external cache like Redis to synchronize counters.

How do you handle real client IP extraction behind proxies or load balancers?

Check headers like X-Forwarded-For and X-Real-IP to extract the true client IP. Fallback to context.Connection.RemoteIpAddress if headers are missing. Example:

var forwardedFor = context.Request.Headers["X-Forwarded-For"].FirstOrDefault();

How can you skip throttling for static files or internal IPs?

Add logic to bypass throttling for requests to static file paths or from whitelisted IP ranges. This prevents unnecessary rate limiting for assets and trusted infrastructure.

What headers should you return to help clients handle rate limits?

Return X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After headers. These inform clients of their quota and when they can retry, enabling better client-side handling.

How do you monitor and tune rate limiting in production?

Log throttling events, track metrics, and adjust limits based on real usage patterns. Use monitoring tools to detect abuse, legitimate bursts, or misconfigured thresholds, and refine your strategy as needed.

See other aspnet-core posts

Why Request Throttling Matters#

Real-World Scenario: API Under Attack#

Understanding Rate Limiting Strategies#

Request Flow Through Throttling Middleware#

Building the Throttling Middleware#

Registering the Middleware#

How the Middleware Works#

Production Considerations#

Memory Management#

Distributed Cache for Scale#

Performance Optimization#

Monitoring and Alerting#

Best Practices#

Wrapping Up#

Frequently Asked Questions#