Share via

Token bucket rate limiting not working as expected.

Arjun J 0 Reputation points
2026-03-27T11:14:20.7033333+00:00

Hello, I am facing an issue with Token bucket rate limiting, for testing purpose i set the limit to 5 and replenishment amount to 3 every 2 minutes and created partition based on userId. i created a small code which calls an api ( i have added the per user policy in its controller) in a loop over 50 times :

var response = await client.GetAsync("api/Account/all");
Console.WriteLine($"Request {i}: {(int)response.StatusCode}");
await Task.Delay(100);}   

the first 5 request works as expected returning back 200 and the 6th one gets 429, but sometime after betweeen the 20th and 30th request one of the request gets 200 and the rest is 429, if it was the token getting replenished then the next two request should also get 200 right ? i cannot seem to understand this behaviour, there is also one other weird behavior when i set the time to 1 minute then the 6th request is 200 and from the 7th it gets 429.

I am running the web api locally on my system and i have verified whether the userId is being populated each time, it does. I am Lost, any help is much appreciated.


  
builder.Services.AddRateLimiter(options =>

{

    


options.AddPolicy(GeneralConstants.PER_USER_RATE_LIMIT_POLICY, httpContext =>

{

    string? userId =  httpContext.User.FindFirst(ClaimTypes.NameIdentifier)?.Value;

    if (!string.IsNullOrWhiteSpace(userId))

    {

        return RateLimitPartition.GetTokenBucketLimiter(

            userId,

            _ => new TokenBucketRateLimiterOptions

            {

                TokenLimit = 5,

                ReplenishmentPeriod = TimeSpan.FromMinutes(2),

                TokensPerPeriod = 3,

                AutoReplenishment = true

            });

    }

    return RateLimitPartition.GetFixedWindowLimiter(

        GeneralConstants.ANONYMOUS,

        _ => new FixedWindowRateLimiterOptions

        {

            PermitLimit = 120,

            Window = TimeSpan.FromSeconds(30)

        });

});

options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

options.OnRejected = async (context, token) =>

{

    if(context.Lease.TryGetMetadata(MetadataName.RetryAfter, out TimeSpan retryAfter))

    {

        context.HttpContext.Response.Headers.RetryAfter = $"{retryAfter.TotalSeconds}";

        ProblemDetailsFactory problemDetailsFactory = context.HttpContext.RequestServices.GetRequiredService<ProblemDetailsFactory>();

        ProblemDetails problemDetails = problemDetailsFactory.CreateProblemDetails(

            context.HttpContext,

            StatusCodes.Status429TooManyRequests,

            "Too Many Requests",

            detail: $"Too many requests. Please try again after {retryAfter.TotalSeconds} seconds.");

        await context.HttpContext.Response.WriteAsJsonAsync( problemDetails, token );

    }

};
});
Developer technologies | .NET | Other
0 comments No comments

2 answers

Sort by: Most helpful
  1. Jack Dang (WICLOUD CORPORATION) 15,870 Reputation points Microsoft External Staff Moderator
    2026-03-30T07:02:56.33+00:00

    Hi @Arjun J ,

    Thanks for reaching out.

    Looking at your setup and test, here’s what’s happening:

    With your token bucket configuration:

    • TokenLimit = 5
    • TokensPerPeriod = 3
    • ReplenishmentPeriod = 2 minutes (or 1 minute in your other test)

    …and your test loop of 50 requests with 100ms between requests, the entire run only takes around 5 seconds. That’s much shorter than either replenishment period, so no tokens should be replenished during this test.

    This means that any requests after the first 5 returning 200 are not caused by normal token replenishment. Likely causes include:

    • The requests may not all be using the same authenticated identity.
    • The tested endpoint may not always be hitting the same rate-limited path or policy.
    • Another client or request source may be affecting what you observe.

    If you’ve verified that the same userId is used for every request, it would be helpful to log the actual partition key and confirm that the same policy is applied on every request. This can help identify if some requests are bypassing the per-user limiter.

    Hope this helps! If my answer was helpful, I would greatly appreciate it if you could follow the instructions here so others with the same problem can benefit as well.


  2. Q&A Assist
    2026-03-27T11:14:58.1266667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The behavior comes from how TokenBucketRateLimiter works internally and how the ASP.NET Core rate limiting middleware uses it, not from a bug in the partitioning by userId.

    Key points about TokenBucketRateLimiter relevant to this behavior:

    1. Tokens are replenished on a timer, not per request
      • With AutoReplenishment = true, an internal timer replenishes tokens every ReplenishmentPeriod.
      • In the docs, when AutoReplenishment is true, tokens are added every ReplenishmentPeriod; when false, the app must call TryReplenish manually.
      • This means that at some point between your 20th and 30th request, the 2‑minute timer fires and adds TokensPerPeriod (3) tokens back into the bucket.
    2. Requests are evaluated one by one, and timing matters
      • Each request is evaluated at the moment it arrives.
      • In the test loop, there is a Task.Delay(100) between requests, so they are spaced ~100 ms apart.
      • When the replenishment timer fires, whichever request happens to arrive just after that moment will see the newly added tokens and succeed (HTTP 200).
      • Subsequent requests may arrive after the replenished tokens have already been consumed by one or more earlier requests, so they can still get 429.
      • Because the requests are not perfectly simultaneous, the 3 replenished tokens can be consumed by up to 3 separate requests that happen to arrive around that time. Depending on exact timing, you might observe only one visible 200 in your console if the others are earlier/later than you expect.
    3. RetryAfter is an estimate, not a guarantee
      • The docs note that retryAfter metadata is available for token bucket, fixed window, and sliding window limiters because these algorithms can estimate when more permits are added.
      • It is not a strict guarantee that exactly at that time there will be a specific number of tokens available; concurrent requests can consume them immediately.
    4. Why “only one 200” between 20th and 30th requests?
      • Start: TokenLimit = 5 → first 5 requests get 200.
      • Requests 6–N: bucket is empty → 429.
      • After 2 minutes, the timer adds 3 tokens.
      • Suppose request 23 arrives just after replenishment → it gets 200 and consumes 1 token (2 left).
      • Requests 24 and 25 might also get 200 if they arrive before any other requests consume the remaining tokens. But due to timing and logging, it can look like only one request got 200 in that range.
      • If the loop is still running when the next 2‑minute period passes, another 3 tokens are added, and the same pattern repeats.
    5. Why does changing ReplenishmentPeriod to 1 minute change which request is 200/429?
      • With a 1‑minute period, replenishment happens earlier relative to your 50‑request loop.
      • If the first minute hasn’t passed by the time the 6th request is sent, it will still see the initial 5 tokens exhausted and get 429.
      • If the 1‑minute replenishment happens before the 6th request, then the 6th request can see replenished tokens and get 200, and subsequent ones get 429 once those tokens are consumed.
      • Small changes in timing (CPU scheduling, debugger attached, etc.) can shift exactly which request index sees the replenished tokens.
    6. Concurrency and client/server timing
      • The server’s internal replenishment timer is independent of the client loop.
      • Even in a simple loop with Task.Delay(100), the actual arrival times at the server can vary.
      • If there are any concurrent requests (e.g., browser, health checks, or other clients), they also consume tokens from the same user partition.

    To better observe the behavior:

    • Log the current token count or remaining permits on each request (for example, by inspecting metadata or wrapping the limiter in a custom policy) to see when replenishment occurs.
    • Shorten ReplenishmentPeriod to a few seconds and reduce TokenLimit/TokensPerPeriod so the pattern is easier to see.
    • Run a controlled test where all 50 requests are fired concurrently (e.g., Parallel.ForEachAsync) and compare with the sequential Task.Delay(100) test; the distribution of 200/429 responses will differ.

    The implementation is consistent with the documented behavior of TokenBucketRateLimiter: tokens are added periodically, and each request consumes one token at its arrival time. The “random” 200s are simply the requests that happen to arrive just after a replenishment event, and the fact that not all three replenished tokens show up as consecutive 200s in the console is due to timing and how quickly subsequent requests consume the remaining tokens.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.