An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Hello Marius !
Thank you for posting on MS Learn Q&A.
I saw that the same issue was reported in Sweden Central for gpt-5-mini and gpt-5-nano were consistently slower than a larger GPT-5 model even with high TPM allocated and the same middleware path.
and if you follow Microsoft Azure OpenAI latency guidance, the response time is affected not only by model and token counts but also by the overall load on the deployment and system so a model specific or region specific capacity issue can absolutely show up as lower tokens or sec without any code change on your side.
What you need to do is treat it as a potential regional or model capacity regression first especially since you already compared against another model in the same region and found only gpt-5 mini degraded and gather all the information and open an Azure ticket.