Azure Open AI Realtime GA Model not working

Question

Azure Open AI Realtime GA Model not working

Christopher Rajan 0

I have developed a conversational AI agent using Azure OpenAI Realtime Preview model which is getting depracted in April. All the audio functions were working properly. When I moved this to Realtime GA model when ever I start speaking its terminating. All the parameters are passed as per the API specification documentation. Do anyone know why is this happening and are there any specific things I need to correct. Or should I file a Azure ticket since the issue is with the Realtime GA Model.

SRILAKSHMI C 16,305 Reputation points Microsoft External Staff Moderator

2026-03-24T10:07:27.9166667+00:00
Hello Christopher Rajan,

Welcome to Microsoft Q&A and Thank you for reaching out,

I understand that your migration from the Realtime Preview model to the Realtime GA model is running into issues right when audio streaming begins. This is something we’ve seen in a few transitions, and it’s usually caused by a combination of endpoint changes, stricter audio validation, and protocol differences in GA rather than just parameters alone.

Key areas to validate

Make sure you’re using a supported GA model

Preview model names are not valid in GA.

Supported GA models include:

gpt-realtime (v2025-08-28)

gpt-realtime-mini (v2025-10-06 / v2025-12-15)

gpt-realtime-1.5 (v2026-02-23)

Double-check your deployment in the Azure portal or via CLI to ensure it targets one of the above.

Use the correct GA endpoint format

GA uses a different endpoint structure:

Example:

wss://<resource>.openai.azure.com/openai/v1/audio/realtime/webrtc?deploymentId=<deployment>

Important Use /openai/v1, Do NOT include api-version, Do NOT use preview endpoints

Validate audio format

GA is stricter than preview when it comes to audio.

Verify Encoding: pcm16 (most commonly expected)

Sample rate: 16 kHz or 24 kHz (must match config exactly)

Channels: Mono

Chunking: Properly sized continuous frames (no empty/oversized chunks)

A mismatch here often results in Session starts → first audio chunk → immediate termination

Check WebSocket / event schema changes

Between Preview and GA, there are subtle but important differences in:

Event names

Payload structure

Event sequencing

For example, preview-style events like:

input_audio_buffer.append

input_audio_buffer.commit

may not behave the same in GA if ordering or structure differs.

Compare your Preview vs GA payloads side-by-side.

Session configuration alignment

Ensure your session init includes properly aligned settings:

modalities: e.g., ["audio", "text"]

Input/output audio config

Voice settings (if using TTS)

Misalignment here can cause termination when audio begins.

Turn detection / VAD configuration

If you’re using Voice Activity Detection, Auto turn detection

Aggressive or invalid configs can prematurely close the session when speech starts.

Authentication, permissions, and quotas

Double-check Correct GA endpoint + deployment name

Valid auth headers / token

Azure RBAC permissions (OpenAI + speech-related access if applicable)

Sufficient quotas: Audio tokens, Concurrent sessions

Also review Azure Monitor logs for:

401 / 403 errors

Connection drops

Backend validation failures

Recommended troubleshooting steps

Test with a minimal GA sample

Use a simple WebSocket/WebRTC sample

Send a small valid PCM audio chunk → Helps isolate whether it’s config vs implementation

Capture failure details

WebSocket close code

Error payload

Last event before termination

Compare Preview vs GA requests

Endpoint

Payload structure

Event flow

Validate audio stream explicitly

No empty frames

Correct encoding/sample rate

Proper chunk size and timing

This is a known migration pain point from Preview → GA

Most common causes:

Audio format mismatch

Event/schema differences

Incorrect endpoint/model usage

There are also new requirements around GA endpoints, models, and quotas that must be aligned.

Please refer this

Troubleshoot WebRTC API connection issues https://learn.microsoft.com/azure/ai-services/openai/how-to/realtime-audio-websockets

Use the GPT Realtime API for speech and audio https://learn.microsoft.com/azure/foundry/openai/how-to/realtime-audio?wt.mc_id=knowledgesearch_inproduct_azure-cxp-community-insider

Azure OpenAI model deprecation & updating https://learn.microsoft.com/azure/ai-services/openai/concepts/model-retirements#current-models

I Hope this helps. Do let me know if you have any further queries.

Thank you!
SRILAKSHMI C 16,305 Reputation points Microsoft External Staff Moderator

2026-03-26T12:11:28.54+00:00

Hi Christopher Rajan,

Did you get any chance to review the above response. Do let me know if you have any further queries.

Thank you!
Christopher Rajan 0 Reputation points

2026-03-26T16:48:50.01+00:00

Thanks Srilakshmi. It was helpful

One more question. When I tried with Realtime-Mini, the audio quality was poor and also the speed of the audio delivery is faster. Anyways to optimise this and how I can make the gpt realtime-Mini model function better when compared with Realtime GA.
SRILAKSHMI C 16,305 Reputation points Microsoft External Staff Moderator

2026-03-27T10:55:08.4533333+00:00
Hi Christopher Rajan,

What you’re seeing with gpt-realtime-mini is expected to some extent. The mini model is designed primarily for low latency and cost efficiency, so it tends to prioritize speed over audio richness. Because of that, the output can feel a bit faster and less natural compared to the full **gpt-realtime GA model`.

That said, there are a few things you can do to improve the experience:

Make sure you are explicitly setting the output format instead of relying on defaults. Using PCM16 with a 24 kHz sample rate generally gives better clarity than 16 kHz. Lower sample rates can make the audio sound compressed.

The faster delivery is usually due to how the model generates audio chunks. You can influence this by:

Adding instructions in your system prompt like “respond in a slow and natural conversational tone”

If your implementation supports it, adjusting any available speech rate or voice parameters

In some cases, the audio feels “too fast” because chunks are played immediately as they arrive. Introducing a small buffer on the client side (even a few hundred milliseconds) can make playback sound more natural and less rushed.

Ensure the input stream is clean (correct sample rate, no clipping, minimal noise). Poor input sometimes leads to less stable or rushed responses.

If you’re using automatic turn detection (VAD), aggressive settings can cause the model to respond too quickly or cut off speech early. Slightly relaxing those thresholds can help smooth the interaction.

Even with these optimizations, gpt-realtime-mini will not match the audio quality or natural pacing of the full gpt-realtime GA model. That difference is by design.

If your scenario is latency-sensitive or cost-sensitive, mini is a good fit

If your scenario is customer-facing and audio quality matters, the full GA model is the better choice.

Thank you!
Christopher Rajan 0 Reputation points

2026-03-31T00:44:09.5966667+00:00

Thanks Srilakhsmi, this is helpful as well. I see a marked difference in the functioning of these models and I understand the rationale.

Can you also help me with this question. As I use Azure Communication Services , especially the telephony to call and send SMS, the calls dialled from ACS are inconistent. Sometimes the calls are coming in immediately and sometimes the calls never come . In certain instances it takes a long time for the calls to come especially for the realtime-GA model. What could be the reason and is there anything I can do to make it consistent. Are there any other mechanisms to make it function consistently as I am placing this in production for physical security companies where conistency is a important requirement.

Thanks.
SRILAKSHMI C 16,305 Reputation points Microsoft External Staff Moderator

2026-04-02T07:04:51.7833333+00:00
Hi Christopher Rajan,

Glad the earlier suggestions helped.

What you’re seeing

The behavior you described with Azure Communication Services where:

Some calls connect immediately

Some are delayed

Some never arrive is typically not caused by a single issue, but rather a combination of telephony, network, and real-time model interaction factors.

When you combine ACS and real-time audio models, you introduce multiple layers where latency or failures can occur.

Key reasons for inconsistency

Telephony carrier variability

ACS relies on PSTN carriers, and call delivery depends on:

Destination country/operator

Carrier routing paths

Spam filtering / call blocking

This can cause Delayed ringing, Dropped calls, Calls not reaching the recipient at all

This is expected behavior in telephony systems globally, not just ACS.

Region and routing mismatch

If your setup looks like:

ACS resource - one region

Azure OpenAI (realtime) - another region

Then Call setup and media routing latency increases, Real-time processing delays can impact call timing

Real-time model initialization delay

With gpt-realtime Sessions may take time to initialize

Audio pipeline (WebRTC / streaming) must be ready

If the call connects before the model is ready, you may see Silence, Delayed response, Perceived “call issues”

Network / media path issues

Real-time audio depends on:

Stable WebRTC connections

Low latency network paths

No firewall/TLS interference

Any instability here can:

Delay audio start

Drop sessions mid-call

Throughput / concurrency limits

If you’re scaling ACS concurrent call limits

Azure OpenAI session/concurrency limits

can introduce Queuing delays, Call setup latency, Inconsistent behavior under load

What you can do to improve consistency

1. Align regions

Keep everything in the same or nearby region:

ACS resource

Azure OpenAI realtime deployment

Any backend services

This reduces call setup and model latency significantly.

2. Pre-initialize the realtime session

Instead of starting everything at call time:

Initialize your realtime session before placing the call

Keep it warm/ready

So when the call connects Audio pipeline is already active

No delay at first interaction

3. Add call retry + fallback logic

For production-grade systems:

Retry failed calls with backoff

Detect No answer, Failed connection

Automatically reattempt

This is standard practice in telephony systems.

4. Implement call status monitoring

Use ACS call events Call connected, Call failed, Call ended

Log and track Failure patterns, Carrier-related issues

5. Buffer initial audio playback

Even after connection Add a small buffer before sending/playing audio

Avoid immediate streaming at connection moment

This helps Smooth startup experience, Avoid early cut-offs

6. Validate destination numbers

Ensure Numbers are in proper E.164 format

Not flagged as spam by carriers

Allowed for outbound dialing

7. Consider multi-region

For high reliability use cases:

Deploy in multiple regions

Route calls dynamically

If one path fails - fallback to another

8. Monitor quotas and scaling

Check ACS concurrent call limits

Azure OpenAI realtime session limits

Scale appropriately for production load.

Even with all optimizations:

100% telephony consistency is not achievable due to carrier dependencies

However, you can achieve:

High reliability (99%+)

Predictable performance

Fast recovery from failures

Thank you!

1 answer

Your answer

SRILAKSHMI C 16,305 Reputation points Microsoft External Staff Moderator

2026-03-26T12:11:28.54+00:00

Hi Christopher Rajan,

Did you get any chance to review the above response. Do let me know if you have any further queries.

Thank you!
Christopher Rajan 0 Reputation points

2026-03-26T16:48:50.01+00:00

Thanks Srilakshmi. It was helpful

One more question. When I tried with Realtime-Mini, the audio quality was poor and also the speed of the audio delivery is faster. Anyways to optimise this and how I can make the gpt realtime-Mini model function better when compared with Realtime GA.
Christopher Rajan 0 Reputation points

2026-03-31T00:44:09.5966667+00:00

Thanks Srilakhsmi, this is helpful as well. I see a marked difference in the functioning of these models and I understand the rationale.

Can you also help me with this question. As I use Azure Communication Services , especially the telephony to call and send SMS, the calls dialled from ACS are inconistent. Sometimes the calls are coming in immediately and sometimes the calls never come . In certain instances it takes a long time for the calls to come especially for the realtime-GA model. What could be the reason and is there anything I can do to make it consistent. Are there any other mechanisms to make it function consistently as I am placing this in production for physical security companies where conistency is a important requirement.

Thanks.

Answer 1

The behavior is consistent with known migration and protocol differences between the Realtime Preview and Realtime GA APIs. Several GA-specific requirements can cause the session to terminate as soon as audio starts, even if parameters appear correct.

Key areas to verify and correct before opening a support ticket:

Use the GA endpoint format

GA no longer accepts date-based api-version or the api-version query parameter.

WebSocket endpoint must be:

  wss://<resource>.openai.azure.com/openai/v1/realtime

If the connection is still using something like:
```
  wss://<resource>.openai.azure.com/openai/realtime?api-version=2025-04-01-preview
```
the service can fail or behave unexpectedly. Remove api-version and ensure /openai/v1 is present.

Update protocol and event names for GA If a custom client was built against the Preview protocol, several event names and message formats changed and must be updated. Using old event names can cause the server to close the session when audio starts.

Important GA changes:

session.update now requires a type field:
- "realtime" for speech-to-speech
- "transcription" for realtime audio transcription Example:
```
  {
    "type": "session.update",
    "session": {
      "type": "realtime",
      "model": "gpt-realtime"
      // other config
    }
  }
```
If type is missing, the server can reject the configuration and terminate.
Event name changes:
- response.text.delta → response.output_text.delta
- response.audio.delta → response.output_audio.delta
- response.audio_transcript.delta → response.output_audio_transcript.delta
Assistant message content types changed:
- type: "text" → type: "output_text"
- type: "audio" → type: "output_audio"
All conversation item events now include object: "realtime.item".

Ensure all event handlers, message builders, and parsing logic are updated to these GA names and shapes. If the client still sends or expects Preview event types, the conversation can appear to “terminate” when audio is sent.

Remove Preview-only headers

Do not send the OpenAI-Beta header in any GA requests. Keeping this header from Preview code can cause protocol issues.

Audio format and transport checks Even if this worked in Preview, confirm that the audio being sent still matches GA expectations:

PCM 16-bit (pcm16)
Mono
24 kHz sample rate
For JSON transport, audio chunks must be base64-encoded and reasonably small (around 100 ms per chunk). Oversized or malformed chunks can cause the server to close the connection.

SDK and client implementation

GA protocol is only supported in the official OpenAI SDKs and in custom clients that fully implement the GA protocol.
For .NET, GA requires OpenAI .NET SDK version 2.9.0 or later. Earlier versions do not support the GA Realtime protocol and can fail when audio starts.

When to open an Azure support ticket After verifying all of the following:

Endpoint uses /openai/v1 and no api-version query parameter.
No OpenAI-Beta header is sent.
session.update includes session.type set correctly ("realtime" or "transcription").
All event names and content types are updated to GA (response.output_* and output_* content types).
Audio format and chunking match the documented requirements.
A supported SDK version or a correctly updated custom client is used.

If the session still terminates immediately when speaking, collect:

Exact endpoint URL (with resource and path, redacting keys)
Region and deployment name
Timestamps and correlation IDs if available
Example session.update payload
A small example of the audio payload

Then open an Azure support ticket, as this may indicate a service-side issue with the Realtime GA model.

References:

Share via

Azure Open AI Realtime GA Model not working

1 answer

Your answer