Azure Speech DragonHDLatestNeural produces electrical/static noise during speech in both raw MP3 and RIFF WAV outputs

Question

Azure Speech DragonHDLatestNeural produces electrical/static noise during speech in both raw MP3 and RIFF WAV outputs

August 0

Hi,

I’m seeing a consistent audio quality issue with Azure Speech HD voices, and I want to confirm whether this is a known DragonHDLatestNeural problem or something else in my setup.

The issue is not just “bad playback” or a mislabeled WAV file. The sound is an electrical/static noise that appears whenever the voice is speaking. Silence is mostly fine, but the static rides on top of the spoken audio.

What I tested:

Voice: en-US-Steffan:DragonHDLatestNeural
Service: Azure Speech TTS
Input: SSML/plain narration text
Output formats tested directly from Azure:
- audio-48khz-192kbitrate-mono-mp3
  - riff-48khz-16bit-mono-pcm

Important detail:

I tested both a direct raw MP3 output and a direct RIFF WAV output from Azure.
The problem is still present in the raw Azure output itself.
Because of that, this does not look like the usual “RAW PCM saved as .wav” header/container issue.

What I want to understand:

Is this a known issue with DragonHDLatestNeural voices?
Can HD voices sometimes generate static/noise artifacts on certain text or SSML patterns?
Are there recommended output formats, regions, or voice settings for avoiding this?
Is there any guidance for diagnosing whether this is voice-model output vs. service-side encoding?

0 comments

2 answers

Your answer

Answer 1

Hi @August,

Based on current documentation, there is no officially documented known issue where DragonHDLatestNeural voices (including en-US-Steffan:DragonHDLatestNeural) consistently generate electrical or static noise as an expected behavior. Azure Speech HD voices are designed to produce high‑fidelity audio, and static artifacts are not described as a normal characteristic of the model or supported output formats.

That said, since you’re able to reproduce the noise directly in the raw Azure‑generated MP3 and RIFF PCM outputs, this does help rule out common client‑side causes such as WAV header/container mismatches or local playback issues. The behavior appearing only during voiced segments (with silence remaining clean) does suggest a synthesis‑stage artifact rather than a decoding problem.

To further isolate the scope, here are the recommended diagnostics:

Swap voices and regions

Test another HD voice (for example, en-US-Ava:DragonHDLatestNeural) in the same region.
As a control, test a non‑HD neural voice (such as en-US-AriaNeural).
If the noise occurs only on Steffan or only in a specific region, that points toward a voice‑ or region‑specific issue rather than a client setup.

Test alternate output formats

In addition to 48 kHz MP3 (192 kbps) and RIFF PCM, try 24 kHz formats (MP3 or RIFF PCM) and compare.
Continue inspecting the raw binary output without any post‑processing to confirm behavior remains consistent.

Minimize input complexity

Test with a very short, plain‑text phrase (for example, “Hello world”).
If the artifact disappears, it may be correlated with specific SSML constructs or text patterns, which has been seen in isolated HD‑voice quality reports.

Cross‑validate call paths

If you’re using the Speech SDK, ensure it’s on the latest version.
As you’ve already done, validating via the REST API is a good approach to rule out SDK or streaming effects.

Kindly review the above steps and let me know if you need further assistance.

August 0 Reputation points

2026-04-28T22:32:33.5966667+00:00

Hi Thanmayi,

Thanks for the diagnostic checklist. I ran through it and the result narrows the scope cleanly. The static is isolated to the Dragon HD voice family. Every Dragon HD voice I tested reproduces the same electrical/static artifact during voiced segments; every non‑Dragon voice in the same region, with identical SSML and identical output format, is clean.

Test matrix (region eastus2, same SSML, same text):

Voice Format Result

en-US-Steffan:DragonHDLatestNeural audio-48khz-192kbitrate-mono-mp3 Static on voiced audio

en-US-Steffan:DragonHDLatestNeural riff-48khz-16bit-mono-pcm Static on voiced audio

en-US-Ava:DragonHDLatestNeural riff-48khz-16bit-mono-pcm Static on voiced audio

en-US-AriaNeural (non‑HD) riff-48khz-16bit-mono-pcm Clean

Other non‑Dragon neural voices same Clean

What this rules out:

Not encoding / container. Static is present in raw uncompressed riff-48khz-16bit-mono-pcm, so it cannot be MP3 compression or a WAV header issue, the artifact is baked into the synthesis output before the encoder.

Not client / playback. Same raw bytes from Azure carry the noise regardless of player.

Not region or account configuration. Non‑Dragon voices on the same region/key/SSML/format render cleanly.

Not SSML‑specific. I reduced input to plain short phrases (e.g. "Hello world") and the static is still present on Dragon HD voices.

What this points to: a synthesis‑stage artifact specific to the Dragon HD model family, not the Speech service encoding pipeline or my client setup.

Could this be escalated to the Dragon HD model team?

Thanks!

Answer 2

The provided information does not document any known issue where DragonHDLatestNeural voices (such as en-US-Steffan:DragonHDLatestNeural) consistently produce electrical/static noise in the synthesized audio.
The documentation describes DragonHDLatestNeural voices as high‑definition neural voices based on large language models, with enhanced contextual understanding and expressive capabilities, but it does not mention static/noise artifacts as an expected or known behavior for specific text or SSML patterns.
Supported usage and configuration:
- DragonHDLatestNeural voices are used through the standard Azure Speech SDK and REST APIs, with the same output formats as non‑HD voices.
- HD voices can be referenced in SSML using the voicename:basemodel:version format, for example:
```
     <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'>
       <voice name='en-US-Ava:DragonHDLatestNeural' parameters='temperature=0.8'>Here is a test</voice>
     </speak>
```
- The documentation lists supported SSML elements for DragonHD voices (for example, <phoneme>, <say-as>, <sub>, <p>, <s>), but does not associate any of them with noise/static issues.
- There is no guidance in the provided material that recommends specific output formats, regions, or voice parameters (such as temperature, top_p, top_k, or cfg_scale) as a way to avoid static or encoding artifacts. Parameter tuning is documented only for controlling expressiveness, stability, and style for DragonHDOmni voices, not for mitigating noise.
Diagnosing model vs. encoding issues:
- The context does not provide a diagnostic procedure specific to distinguishing between voice‑model artifacts and service‑side encoding problems for Azure Speech HD voices.
- It does, however, emphasize that HD voices are standard Azure Speech voices that share the same SDK/REST surface and that they support multiple languages and advanced prosody, without calling out any special handling or known defects in the audio encoding pipeline.

Within the supplied documentation, there is no statement that:

DragonHDLatestNeural voices are known to generate static/electrical noise; or
Certain SSML patterns or text inputs are known to cause such artifacts; or
Particular formats/regions/settings are recommended specifically to avoid static.

References:

Voice	Format	Result
`en-US-Steffan:DragonHDLatestNeural`	`audio-48khz-192kbitrate-mono-mp3`	Static on voiced audio
`en-US-Steffan:DragonHDLatestNeural`	`riff-48khz-16bit-mono-pcm`	Static on voiced audio
`en-US-Ava:DragonHDLatestNeural`	`riff-48khz-16bit-mono-pcm`	Static on voiced audio
`en-US-AriaNeural` (non‑HD)	`riff-48khz-16bit-mono-pcm`	Clean
Other non‑Dragon neural voices	same	Clean

Share via

Azure Speech DragonHDLatestNeural produces electrical/static noise during speech in both raw MP3 and RIFF WAV outputs

2 answers

Your answer