Experiencing slow responses (~ 20 Tokens/s) on responses API

Question

Experiencing slow responses (~ 20 Tokens/s) on responses API

Marius Riesle 0

Currently we are experiencing very slow responses using the responses API with the gpt-5 mini model hosted in sweden central.

Usually we would reach ~70 Tokens/s, now we barely reach ~ 20 Tokens/s. We ran multiple tests and have had this issue since yesterday. Other models like gpt-5.4-mini do not seem to be affected.

Can anyone else validate this and is there anything we can do about it?

0 comments

1 answer

Your answer

Answer 1

Hello Marius !

Thank you for posting on MS Learn Q&A.

I saw that the same issue was reported in Sweden Central for gpt-5-mini and gpt-5-nano were consistently slower than a larger GPT-5 model even with high TPM allocated and the same middleware path.

https://learn.microsoft.com/de-at/answers/questions/5846211/azure-openai-higher-latency-for-gpt-5-mini-and-gpt

and if you follow Microsoft Azure OpenAI latency guidance, the response time is affected not only by model and token counts but also by the overall load on the deployment and system so a model specific or region specific capacity issue can absolutely show up as lower tokens or sec without any code change on your side.

What you need to do is treat it as a potential regional or model capacity regression first especially since you already compared against another model in the same region and found only gpt-5 mini degraded and gather all the information and open an Azure ticket.

Share via

Experiencing slow responses (~ 20 Tokens/s) on responses API

1 answer

Your answer