Share via

Experiencing slow responses (~ 20 Tokens/s) on responses API

Marius Riesle 0 Reputation points
2026-04-15T15:41:19.4233333+00:00

Currently we are experiencing very slow responses using the responses API with the gpt-5 mini model hosted in sweden central.

Usually we would reach ~70 Tokens/s, now we barely reach ~ 20 Tokens/s. We ran multiple tests and have had this issue since yesterday. Other models like gpt-5.4-mini do not seem to be affected.

Can anyone else validate this and is there anything we can do about it?

Azure OpenAI Service
Azure OpenAI Service

An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.

0 comments No comments

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 41,291 Reputation points Volunteer Moderator
    2026-04-15T18:31:52.06+00:00

    Hello Marius !

    Thank you for posting on MS Learn Q&A.

    I saw that the same issue was reported in Sweden Central for gpt-5-mini and gpt-5-nano were consistently slower than a larger GPT-5 model even with high TPM allocated and the same middleware path.

    https://learn.microsoft.com/de-at/answers/questions/5846211/azure-openai-higher-latency-for-gpt-5-mini-and-gpt

    and if you follow Microsoft Azure OpenAI latency guidance, the response time is affected not only by model and token counts but also by the overall load on the deployment and system so a model specific or region specific capacity issue can absolutely show up as lower tokens or sec without any code change on your side.

    What you need to do is treat it as a potential regional or model capacity regression first especially since you already compared against another model in the same region and found only gpt-5 mini degraded and gather all the information and open an Azure ticket.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.