Change modalities mid session for voice live azure api

Question

Change modalities mid session for voice live azure api

Vishal Rawat 40

I tried to change modalities mid session from ["text", "audio"] to ["text"] and vice versa, to implement switch between text and voice feature for my voice+chat agent. I did this by sending session.update event to azure voice-live api with updated modalities but does not seem to work and even the session.updated event returned by azure does not seem to update the modalities.

Any help with this?

Aryan Parashar 3,380 Reputation points Microsoft External Staff Moderator

2025-11-28T09:09:01.85+00:00

Hi Vishal Rawat,

Please let me know if you are still experiencing any issues or if you have any further queries.

Answer accepted by question author

0 additional answers

Your answer

Aryan Parashar 3,380 Reputation points Microsoft External Staff Moderator

2025-11-28T09:09:01.85+00:00

Hi Vishal Rawat,

Please let me know if you are still experiencing any issues or if you have any further queries.

Answer 1

Hi Vishal Rawat,

Thank you for reaching out, and I completely understand your frustration with this issue. You're not alone. Several developers have encountered the same challenge when trying to switch modalities mid-session.

Changing modalities mid-session is problematic in both OpenAI and Azure implementations.

When you send a session.update event to switch between ["text", "audio"] and ["text"], the modalities configuration appears to be locked at session initialization. This is because audio and text processing use fundamentally different pipelines, and the underlying connections are established when the session starts.

Here are some relevant references that discuss this behavior:

https://community.openai.com/t/realtime-api-updating-modalities/996243/3

https://community.openai.com/t/realtime-modalities-session-config-not-disabling-local-model-audio-channel/1279443

https://learn.microsoft.com/en-us/answers/questions/5561090/azure-openai-gpt-realtime-generating-voice-respons

Recommended Solution:

Keep modalities: ["text", "audio"] enabled throughout the session and control the audio behavior at the application level essentially managing when audio features are actively used through your client-side logic rather than trying to reconfigure the session.

Feel free to accept this as an answer.

Thank you for reaching out to the Microsoft Q&A portal!

Vishal Rawat 40 Reputation points

2025-11-28T09:13:47.4433333+00:00

Thanks Aryan. I have handled the audio part on the client keeping the modalities text and audio.
Aryan Parashar 3,380 Reputation points Microsoft External Staff Moderator

2025-11-28T09:24:07.41+00:00

Hi Vishal Rawat,

I’m glad to hear that this has been resolved successfully.

Thank you for reaching out to the Microsoft Q&A Portal.

Share via

Change modalities mid session for voice live azure api

0 additional answers

Your answer