Edit

Share via


Audio capabilities in Azure OpenAI Service

Important

The content filtering system isn't applied to prompts and completions processed by the audio models such as Whisper in Azure OpenAI Service.

Audio models in Azure OpenAI are available via the realtime, completions, and audio APIs. The audio models are designed to handle a variety of tasks, including speech recognition, translation, and text to speech.

For information about the available audio models per region in Azure OpenAI Service, see the audio models, standard models by endpoint, and global standard model availability documentation.

GPT-4o audio Realtime API

GPT-4o real-time audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user. For more information on how to use GPT-4o real-time audio, see the GPT-4o real-time audio quickstart and how to use GPT-4o audio.

GPT-4o audio completions

GPT-4o audio completion is designed to generate audio from audio or text prompts, making it a great fit for generating audio books, audio content, and other use cases that require audio generation. The GPT-4o audio completions model introduces the audio modality into the existing /chat/completions API. For more information on how to use GPT-4o audio completions, see the audio generation quickstart.

Audio API

The audio models via the /audio API can be used for speech to text, translation, and text to speech. To get started with the audio API, see the Whisper quickstart for speech to text.

Note

To help you decide whether to use Azure AI Speech or Azure OpenAI Service, see the Azure AI Speech batch transcription, What is the Whisper model?, and OpenAI text to speech voices guides.