Token bucket rate limiting not working as expected.

Question

Token bucket rate limiting not working as expected.

Arjun J 25

Hello, I am facing an issue with Token bucket rate limiting, for testing purpose i set the limit to 5 and replenishment amount to 3 every 2 minutes and created partition based on userId. i created a small code which calls an api ( i have added the per user policy in its controller) in a loop over 50 times :

var response = await client.GetAsync("api/Account/all");
Console.WriteLine($"Request {i}: {(int)response.StatusCode}");
await Task.Delay(100);}

the first 5 request works as expected returning back 200 and the 6th one gets 429, but sometime after betweeen the 20th and 30th request one of the request gets 200 and the rest is 429, if it was the token getting replenished then the next two request should also get 200 right ? i cannot seem to understand this behaviour, there is also one other weird behavior when i set the time to 1 minute then the 6th request is 200 and from the 7th it gets 429.

I am running the web api locally on my system and i have verified whether the userId is being populated each time, it does. I am Lost, any help is much appreciated.


  
builder.Services.AddRateLimiter(options =>

{

    


options.AddPolicy(GeneralConstants.PER_USER_RATE_LIMIT_POLICY, httpContext =>

{

    string? userId =  httpContext.User.FindFirst(ClaimTypes.NameIdentifier)?.Value;

    if (!string.IsNullOrWhiteSpace(userId))

    {

        return RateLimitPartition.GetTokenBucketLimiter(

            userId,

            _ => new TokenBucketRateLimiterOptions

            {

                TokenLimit = 5,

                ReplenishmentPeriod = TimeSpan.FromMinutes(2),

                TokensPerPeriod = 3,

                AutoReplenishment = true

            });

    }

    return RateLimitPartition.GetFixedWindowLimiter(

        GeneralConstants.ANONYMOUS,

        _ => new FixedWindowRateLimiterOptions

        {

            PermitLimit = 120,

            Window = TimeSpan.FromSeconds(30)

        });

});

options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

options.OnRejected = async (context, token) =>

{

    if(context.Lease.TryGetMetadata(MetadataName.RetryAfter, out TimeSpan retryAfter))

    {

        context.HttpContext.Response.Headers.RetryAfter = $"{retryAfter.TotalSeconds}";

        ProblemDetailsFactory problemDetailsFactory = context.HttpContext.RequestServices.GetRequiredService<ProblemDetailsFactory>();

        ProblemDetails problemDetails = problemDetailsFactory.CreateProblemDetails(

            context.HttpContext,

            StatusCodes.Status429TooManyRequests,

            "Too Many Requests",

            detail: $"Too many requests. Please try again after {retryAfter.TotalSeconds} seconds.");

        await context.HttpContext.Response.WriteAsJsonAsync( problemDetails, token );

    }

};
});

0 comments

Answer accepted by question author

Jack Dang (WICLOUD CORPORATION) 18,720 Microsoft External Staff Moderator

Hi @Arjun J ,

Thanks for reaching out.

Looking at your setup and test, here’s what’s happening:

With your token bucket configuration:

TokenLimit = 5
TokensPerPeriod = 3
ReplenishmentPeriod = 2 minutes (or 1 minute in your other test)

…and your test loop of 50 requests with 100ms between requests, the entire run only takes around 5 seconds. That’s much shorter than either replenishment period, so no tokens should be replenished during this test.

This means that any requests after the first 5 returning 200 are not caused by normal token replenishment. Likely causes include:

The requests may not all be using the same authenticated identity.
The tested endpoint may not always be hitting the same rate-limited path or policy.
Another client or request source may be affecting what you observe.

If you’ve verified that the same userId is used for every request, it would be helpful to log the actual partition key and confirm that the same policy is applied on every request. This can help identify if some requests are bypassing the per-user limiter.

Hope this helps! If my answer was helpful, I would greatly appreciate it if you could follow the instructions here so others with the same problem can benefit as well.

Arjun J 25 Reputation points

2026-03-30T18:03:56.2866667+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,
Thanks for the reply.
I have verified whether the partition key being used in each request is the userId or not and it was the userId each time.
And i was running the web api locally on my system so no requests other than mine.
This issue is easily reproducible as well, Create any blank web api targeted in .net 8 and use my RateLimitingPartition setup and hardcode the userId and apply on any controller.
I did the same on the blank web api that visual studio has which has that one weatherforecast endpoint and i was able to observe the same issue there; before the timer hits the 2 minute window after the 5 200 one of them will get 200 and the rest 429.

I would very much appreciate if you could reproduce this issue and see for yourselves.
Thanks again for the reply.
Jack Dang (WICLOUD CORPORATION) 18,720 Reputation points Microsoft External Staff Moderator

2026-03-31T04:30:06.9766667+00:00
Hi @Arjun J ,

Thanks for following up and for the detailed reproduction steps.

I ran a clean test locally with your token bucket setup, hardcoding the userId and sending 50 requests with 100ms delay. The result was as expected:

First 5 requests returned 200

All subsequent requests returned 429

No 200s appeared after the 5th request before the 2-minute window

Since the “late 200” issue didn’t reproduce in a minimal setup, it’s likely related to differences in your environment, middleware ordering, multiple policies, or configuration details in your original project.

I’d suggest checking:

Middleware and policy registration order

Any additional authentication/identity layers

Other rate limiting policies that might apply

If my explanation was helpful so far, I would appreciate it if you could follow the instructions here as this would be a helpful insight for others running into similar behavior.
Arjun J 25 Reputation points

2026-03-31T12:25:04.2766667+00:00

Hi @Jack Dang (WICLOUD CORPORATION)
Thank you so much for running this yourselves, after seeing this i ran my blank web api project once more and observed the issue does not exists now. I don't know how that happened but i assure you yesterday the issue was there.
Anyways i now believe the issue could be in my middleware or as you suggested the ordering of the pipelines.
I will update this thread as soon as i find the issue.
Thanks again.
Arjun J 25 Reputation points

2026-03-31T12:46:05.8533333+00:00
Hi @Jack Dang (WICLOUD CORPORATION)
Truly sorry to bring this up again, but i ran the blank web api once more and the issue is still exists. This time i called the api every 1000ms here are the details of those apis :
3/31/2026 6:10:25 PM: 200

3/31/2026 6:10:26 PM: 200

3/31/2026 6:10:27 PM: 200

3/31/2026 6:10:28 PM: 200

3/31/2026 6:10:29 PM: 200

3/31/2026 6:10:30 PM: 429

3/31/2026 6:10:31 PM: 429

3/31/2026 6:10:32 PM: 429

3/31/2026 6:10:33 PM: 429

3/31/2026 6:10:34 PM: 429

3/31/2026 6:10:35 PM: 429

3/31/2026 6:10:36 PM: 429

3/31/2026 6:10:37 PM: 429

3/31/2026 6:10:38 PM: 429

3/31/2026 6:10:39 PM: 429

3/31/2026 6:10:40 PM: 429

3/31/2026 6:10:41 PM: 429

3/31/2026 6:10:42 PM: 429

3/31/2026 6:10:43 PM: 429

3/31/2026 6:10:44 PM: 429

3/31/2026 6:10:45 PM: 200

3/31/2026 6:10:46 PM: 429

3/31/2026 6:10:47 PM: 429

3/31/2026 6:10:48 PM: 429

3/31/2026 6:10:49 PM: 429

3/31/2026 6:10:50 PM: 429

3/31/2026 6:10:51 PM: 429

3/31/2026 6:10:52 PM: 429

3/31/2026 6:10:53 PM: 429

3/31/2026 6:10:54 PM: 429

3/31/2026 6:10:55 PM: 429

3/31/2026 6:10:56 PM: 429

3/31/2026 6:10:57 PM: 429

3/31/2026 6:10:58 PM: 429

3/31/2026 6:10:59 PM: 429

3/31/2026 6:11:00 PM: 429

3/31/2026 6:11:01 PM: 429

3/31/2026 6:11:02 PM: 429

3/31/2026 6:11:03 PM: 429

3/31/2026 6:11:04 PM: 429

3/31/2026 6:11:05 PM: 200

3/31/2026 6:11:06 PM: 429

3/31/2026 6:11:07 PM: 429

3/31/2026 6:11:08 PM: 429

3/31/2026 6:11:09 PM: 429

3/31/2026 6:11:11 PM: 429

3/31/2026 6:11:12 PM: 429

3/31/2026 6:11:13 PM: 429

3/31/2026 6:11:14 PM: 429

3/31/2026 6:11:15 PM: 429

3/31/2026 6:11:16 PM: 429

3/31/2026 6:11:17 PM: 429

3/31/2026 6:11:18 PM: 429

3/31/2026 6:11:19 PM: 429

3/31/2026 6:11:20 PM: 429

3/31/2026 6:11:21 PM: 429

3/31/2026 6:11:22 PM: 429

3/31/2026 6:11:23 PM: 429

3/31/2026 6:11:24 PM: 429

3/31/2026 6:11:25 PM: 429

3/31/2026 6:11:26 PM: 200

3/31/2026 6:11:27 PM: 429

3/31/2026 6:11:28 PM: 429

this is the ordering of middleware pipeline in the blank web api :
var app = builder.Build();

// Configure the HTTP request pipeline.

if (app.Environment.IsDevelopment())

{

app.UseSwagger(); app.UseSwaggerUI();

}

app.UseHttpsRedirection();

app.UseAuthorization();

app.Use(async (context, next) =>

{

await next(); Console.WriteLine($"[{DateTime.Now:HH:mm:ss.fff}] Status: {context.Response.StatusCode}");

});

app.UseRateLimiter();

app.MapControllers();

app.Run();
Arjun J 25 Reputation points

2026-04-01T05:54:05.1766667+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,

I think i found what is happening, this was also mentioned by a reddit user where i posted this question.
If you observer the above logs, for that one i set the replenishment timer to 1 minute. Before 1 minute the tokens per period becomes available at every 20 seconds interval. So i ran this again (with looped api call having a 1 second delay between them ) this time i had different configurations. I have set the replenishment period to 30 seconds and observed that i got the token every 10 seconds so if the first request was on 10:00:00 then at 10:00:10 i get one token and that request becomes 200, at 10:00:20 again and at 10:00:30 once more 200, then the rest is 429 until at 10:00:40 and this repeates. 3 tokens per 30 seconds it gives us 1 per 10 seconds.

I know you were not able to reproduce this issue but i would appreciate if you could do this once more now knowing the exact time you could get the 200 from the api.
Thank you.
Jack Dang (WICLOUD CORPORATION) 18,720 Reputation points Microsoft External Staff Moderator

2026-04-01T07:45:30.18+00:00
Hi @Arjun J ,

Based on the behavior you’re describing, it might appear as though tokens are being distributed gradually over time. However, in ASP.NET Core’s built-in token bucket implementation with AutoReplenishment enabled, replenishment is periodic, meaning all tokens are added together at the end of the configured period, not spread across smaller intervals.

So with that configuration, the behavior would be that no new tokens are available until the full period elapses, and then all tokens become available at once.

I have not re-run this exact scenario locally, but based on the .NET 8 implementation, the observed pattern does not match the expected token bucket behavior.

Since the observed pattern doesn’t align with that model, it suggests that something in the request pipeline or request path may be influencing how requests are being processed. A few areas worth taking a closer look at:

The placement of UseRateLimiter() in the middleware pipeline relative to other components

If the partition key depends on httpContext.User, ensuring UseAuthentication() runs before UseRateLimiter()

Calling the final HTTPS endpoint directly during the test instead of relying on redirection

Ensuring all requests consistently go through the same rate limiting policy path

If the pattern continues consistently, it would be helpful to isolate it further with detailed server-side timestamps and, if possible, a minimal repro project. That would help determine whether this is due to pipeline timing or something deeper.
Arjun J 25 Reputation points

2026-04-02T05:30:12.0866667+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,
Thanks for the reply,
At first i thought the same, maybe the authentication middleware or any other was messing the rate limiting, which is why i created a fresh api and ran the rate limiter there.
Anyways i will look into this further and will update this thread if i have found the answer for this.
Thanks again for the help. Much appreciated.
Jack Dang (WICLOUD CORPORATION) 18,720 Reputation points Microsoft External Staff Moderator

2026-04-07T07:02:38.5266667+00:00

Hi @Arjun J ,

I wonder if there is any update on this post. If you have any question, feel free to reach out. I'm happy to support you.
Arjun J 25 Reputation points

2026-04-08T08:53:02.5433333+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,
sorry for the delay, I also posted this question on reddit and some of the users over there confirms they do have the same issue and they said that this is indeed how Token bucket partition works and that replenishes token every 100 milliseconds and that autoreplenishment changes to false regardless of what i set. if you want i can share the link of that thread.
Jack Dang (WICLOUD CORPORATION) 18,720 Reputation points Microsoft External Staff Moderator

2026-04-08T10:53:47.85+00:00

Hi @Arjun J ,

Thanks for coming back and sharing your feedback.

It would be nice if you can share the link of that thread so anyone that has the same problem can benefit as well.

In order for them to notice, I would greatly appreciate it if you could follow the instructions here.
Arjun J 25 Reputation points

2026-04-10T04:14:49.6833333+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,
Of course, here is the Reddit thread : https://www.reddit.com/r/dotnet/s/XY2imrawPC
Thanks again for your help, much appreciated and i will follow those instructions.

1 additional answer

Your answer

Arjun J 25 Reputation points

2026-03-30T18:03:56.2866667+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,
Thanks for the reply.
I have verified whether the partition key being used in each request is the userId or not and it was the userId each time.
And i was running the web api locally on my system so no requests other than mine.
This issue is easily reproducible as well, Create any blank web api targeted in .net 8 and use my RateLimitingPartition setup and hardcode the userId and apply on any controller.
I did the same on the blank web api that visual studio has which has that one weatherforecast endpoint and i was able to observe the same issue there; before the timer hits the 2 minute window after the 5 200 one of them will get 200 and the rest 429.

I would very much appreciate if you could reproduce this issue and see for yourselves.
Thanks again for the reply.
Jack Dang (WICLOUD CORPORATION) 18,720 Reputation points Microsoft External Staff Moderator

2026-03-31T04:30:06.9766667+00:00

Hi @Arjun J ,

Thanks for following up and for the detailed reproduction steps.

I ran a clean test locally with your token bucket setup, hardcoding the userId and sending 50 requests with 100ms delay. The result was as expected:

First 5 requests returned 200

All subsequent requests returned 429

No 200s appeared after the 5th request before the 2-minute window

Since the “late 200” issue didn’t reproduce in a minimal setup, it’s likely related to differences in your environment, middleware ordering, multiple policies, or configuration details in your original project.

I’d suggest checking:

Middleware and policy registration order

Any additional authentication/identity layers

Other rate limiting policies that might apply

If my explanation was helpful so far, I would appreciate it if you could follow the instructions here as this would be a helpful insight for others running into similar behavior.
Arjun J 25 Reputation points

2026-03-31T12:25:04.2766667+00:00

Hi @Jack Dang (WICLOUD CORPORATION)
Thank you so much for running this yourselves, after seeing this i ran my blank web api project once more and observed the issue does not exists now. I don't know how that happened but i assure you yesterday the issue was there.
Anyways i now believe the issue could be in my middleware or as you suggested the ordering of the pipelines.
I will update this thread as soon as i find the issue.
Thanks again.
Arjun J 25 Reputation points

2026-04-01T05:54:05.1766667+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,

I think i found what is happening, this was also mentioned by a reddit user where i posted this question.
If you observer the above logs, for that one i set the replenishment timer to 1 minute. Before 1 minute the tokens per period becomes available at every 20 seconds interval. So i ran this again (with looped api call having a 1 second delay between them ) this time i had different configurations. I have set the replenishment period to 30 seconds and observed that i got the token every 10 seconds so if the first request was on 10:00:00 then at 10:00:10 i get one token and that request becomes 200, at 10:00:20 again and at 10:00:30 once more 200, then the rest is 429 until at 10:00:40 and this repeates. 3 tokens per 30 seconds it gives us 1 per 10 seconds.

I know you were not able to reproduce this issue but i would appreciate if you could do this once more now knowing the exact time you could get the 200 from the api.
Thank you.
Jack Dang (WICLOUD CORPORATION) 18,720 Reputation points Microsoft External Staff Moderator

2026-04-01T07:45:30.18+00:00

Hi @Arjun J ,

Based on the behavior you’re describing, it might appear as though tokens are being distributed gradually over time. However, in ASP.NET Core’s built-in token bucket implementation with AutoReplenishment enabled, replenishment is periodic, meaning all tokens are added together at the end of the configured period, not spread across smaller intervals.

So with that configuration, the behavior would be that no new tokens are available until the full period elapses, and then all tokens become available at once.

I have not re-run this exact scenario locally, but based on the .NET 8 implementation, the observed pattern does not match the expected token bucket behavior.

Since the observed pattern doesn’t align with that model, it suggests that something in the request pipeline or request path may be influencing how requests are being processed. A few areas worth taking a closer look at:

The placement of UseRateLimiter() in the middleware pipeline relative to other components

If the partition key depends on httpContext.User, ensuring UseAuthentication() runs before UseRateLimiter()

Calling the final HTTPS endpoint directly during the test instead of relying on redirection

Ensuring all requests consistently go through the same rate limiting policy path

If the pattern continues consistently, it would be helpful to isolate it further with detailed server-side timestamps and, if possible, a minimal repro project. That would help determine whether this is due to pipeline timing or something deeper.
Arjun J 25 Reputation points

2026-04-02T05:30:12.0866667+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,
Thanks for the reply,
At first i thought the same, maybe the authentication middleware or any other was messing the rate limiting, which is why i created a fresh api and ran the rate limiter there.
Anyways i will look into this further and will update this thread if i have found the answer for this.
Thanks again for the help. Much appreciated.
Jack Dang (WICLOUD CORPORATION) 18,720 Reputation points Microsoft External Staff Moderator

2026-04-07T07:02:38.5266667+00:00

Hi @Arjun J ,

I wonder if there is any update on this post. If you have any question, feel free to reach out. I'm happy to support you.
Arjun J 25 Reputation points

2026-04-08T08:53:02.5433333+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,
sorry for the delay, I also posted this question on reddit and some of the users over there confirms they do have the same issue and they said that this is indeed how Token bucket partition works and that replenishes token every 100 milliseconds and that autoreplenishment changes to false regardless of what i set. if you want i can share the link of that thread.
Jack Dang (WICLOUD CORPORATION) 18,720 Reputation points Microsoft External Staff Moderator

2026-04-08T10:53:47.85+00:00

Hi @Arjun J ,

Thanks for coming back and sharing your feedback.

It would be nice if you can share the link of that thread so anyone that has the same problem can benefit as well.

In order for them to notice, I would greatly appreciate it if you could follow the instructions here.
Arjun J 25 Reputation points

2026-04-10T04:14:49.6833333+00:00

Hi @Jack Dang (WICLOUD CORPORATION) ,
Of course, here is the Reddit thread : https://www.reddit.com/r/dotnet/s/XY2imrawPC
Thanks again for your help, much appreciated and i will follow those instructions.

Answer 1

The behavior comes from how TokenBucketRateLimiter works internally and how the ASP.NET Core rate limiting middleware uses it, not from a bug in the partitioning by userId.

Key points about TokenBucketRateLimiter relevant to this behavior:

Tokens are replenished on a timer, not per request
- With AutoReplenishment = true, an internal timer replenishes tokens every ReplenishmentPeriod.
- In the docs, when AutoReplenishment is true, tokens are added every ReplenishmentPeriod; when false, the app must call TryReplenish manually.
- This means that at some point between your 20th and 30th request, the 2‑minute timer fires and adds TokensPerPeriod (3) tokens back into the bucket.
Requests are evaluated one by one, and timing matters
- Each request is evaluated at the moment it arrives.
- In the test loop, there is a Task.Delay(100) between requests, so they are spaced ~100 ms apart.
- When the replenishment timer fires, whichever request happens to arrive just after that moment will see the newly added tokens and succeed (HTTP 200).
- Subsequent requests may arrive after the replenished tokens have already been consumed by one or more earlier requests, so they can still get 429.
- Because the requests are not perfectly simultaneous, the 3 replenished tokens can be consumed by up to 3 separate requests that happen to arrive around that time. Depending on exact timing, you might observe only one visible 200 in your console if the others are earlier/later than you expect.
RetryAfter is an estimate, not a guarantee
- The docs note that retryAfter metadata is available for token bucket, fixed window, and sliding window limiters because these algorithms can estimate when more permits are added.
- It is not a strict guarantee that exactly at that time there will be a specific number of tokens available; concurrent requests can consume them immediately.
Why “only one 200” between 20th and 30th requests?
- Start: TokenLimit = 5 → first 5 requests get 200.
- Requests 6–N: bucket is empty → 429.
- After 2 minutes, the timer adds 3 tokens.
- Suppose request 23 arrives just after replenishment → it gets 200 and consumes 1 token (2 left).
- Requests 24 and 25 might also get 200 if they arrive before any other requests consume the remaining tokens. But due to timing and logging, it can look like only one request got 200 in that range.
- If the loop is still running when the next 2‑minute period passes, another 3 tokens are added, and the same pattern repeats.
Why does changing ReplenishmentPeriod to 1 minute change which request is 200/429?
- With a 1‑minute period, replenishment happens earlier relative to your 50‑request loop.
- If the first minute hasn’t passed by the time the 6th request is sent, it will still see the initial 5 tokens exhausted and get 429.
- If the 1‑minute replenishment happens before the 6th request, then the 6th request can see replenished tokens and get 200, and subsequent ones get 429 once those tokens are consumed.
- Small changes in timing (CPU scheduling, debugger attached, etc.) can shift exactly which request index sees the replenished tokens.
Concurrency and client/server timing
- The server’s internal replenishment timer is independent of the client loop.
- Even in a simple loop with Task.Delay(100), the actual arrival times at the server can vary.
- If there are any concurrent requests (e.g., browser, health checks, or other clients), they also consume tokens from the same user partition.

To better observe the behavior:

Log the current token count or remaining permits on each request (for example, by inspecting metadata or wrapping the limiter in a custom policy) to see when replenishment occurs.
Shorten ReplenishmentPeriod to a few seconds and reduce TokenLimit/TokensPerPeriod so the pattern is easier to see.
Run a controlled test where all 50 requests are fired concurrently (e.g., Parallel.ForEachAsync) and compare with the sequential Task.Delay(100) test; the distribution of 200/429 responses will differ.

The implementation is consistent with the documented behavior of TokenBucketRateLimiter: tokens are added periodically, and each request consumes one token at its arrival time. The “random” 200s are simply the requests that happen to arrive just after a replenishment event, and the fact that not all three replenished tokens show up as consecutive 200s in the console is due to timing and how quickly subsequent requests consume the remaining tokens.

References:

Share via

Token bucket rate limiting not working as expected.

1 additional answer

Your answer