Hello 稀渺 陈,
Welcome to Microsoft Q&A and Thank you for reaching out.
I understand that you want to perform pronunciation assessment on short audio while trying to reduce costs by using Azure’s fast transcription mode. Here’s some guidance:
Your current REST API example defaults to the conversation (real-time) endpoint, which is why it is billed at $1.3 per hour. This mode streams audio and performs recognition in real-time, supporting full pronunciation assessment features.
Azure provides a Batch/Short Audio REST API for fast transcription of short audio files. This mode is more cost-efficient (around $0.66 per hour) because it processes the entire audio after upload instead of maintaining a real-time session.
Unfortunately, pronunciation assessment is not supported on the fast transcription (batch/short audio) endpoint. It requires specific headers and models available only through the real-time/conversation endpoint (or the short-audio REST API for clips up to 30 seconds). Therefore, you cannot directly switch your REST call to fast transcription while retaining full pronunciation scoring.
Use the short-audio REST API for clips up to 30 seconds to reduce costs compared to longer real-time sessions.
Segment longer audio into smaller batches (less than 1–2 minutes) and send them to the real-time endpoint to minimize hourly billing.
Consider purchasing a commitment tier for Azure Speech services to gain additional cost savings.
Monitor usage carefully to ensure you stay within your budget while performing pronunciation assessment.
To enable pronunciation assessment, make sure your code targets the endpoint that supports it and includes the required headers. If you’d like, we can provide a sample modified workflow showing how to optimize cost for pronunciation assessment using shorter audio segments.
Please refer this Speech to text REST API for short audio, pronunciation assessment.
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thank you!