High egress/bandwidth usage with Batch Transcription contentUrls - Does the service download files multiple times?

Antonio Feregrino 0 Reputation points
2025-12-04T18:02:22.35+00:00

Hi everyone,

I’m working with the Azure AI Speech Batch Transcription API and running into an issue with unexpected bandwidth consumption.

Setup:

We are submitting transcription jobs using the contentUrls property, pointing to audio files hosted on our own external storage (non-Azure).

Problem:

We are noticing a significant spike in egress traffic from our storage that doesn't add up. The total bandwidth consumed is notably higher than the total size of the audio files we are submitting.

We don't have granular access logs on the storage side to pinpoint exact request counts, but the traffic volume suggests the Speech service is accessing or downloading the same file multiple times per transaction.

My Questions:

  1. Is this expected behaviour? Does the Batch Transcription service perform multiple passes (e.g., specific HEAD probes for metadata followed by the GET, or separate downloads for different processing stages)?
  2. Retry Logic: If the service encounters a transient network issue, does it restart the download from scratch?
  3. Documentation: I’ve looked through the Batch Transcription docs but can't find any info regarding "single-access" guarantees or retry behaviour for contentUrls.

Has anyone else experienced this "traffic multiplier" when hosting files externally?

Thanks!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
0 comments No comments
{count} votes

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.