Custom Avatar Quality not Upto the Mark

Question

Custom Avatar Quality not Upto the Mark

DARSHIL SHAH7 0

I recently create my own custom avatar using Azure Speech Studio.
Even with the training video recordings being done in a studio with a professional camera and a green screen background, when an Avatar video (batch) is generated, the hands often become invisible/merge with the background when moving away from the body.
This happens not just when the hand movements are fast, but also when they are slow.
As soon as the hands move out of the body border, they become invisible/merged with the background
I am not understanding why this is happening even though the quality of training video was good.
Could you guide me on what the possible issue may be and how to resolve this?

0 comments

1 answer

Your answer

Answer 1

Hello DARSHIL SHAH7,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that your Custom Avatar Quality not Upto the Mark.

The issue isn’t caused by video quality; it results from Azure Avatar model limits handling limb movement outside its learned body zone. The only dependable approach is to retrain with controlled gesture ranges and keep motions inside model‑safe spatial boundaries. Reference: Azure AI Speech Avatars - https://learn.microsoft.com/azure/ai-services/speech-avatar/overview

Follow the below steps as-is to resolve the issue:

Keep hand motions within torso width and avoid extending arms far from the body silhouette. This matches the model’s supported posture range. Model guidelines: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/avatar-studio
Include training clips showing hands left, right, above shoulders, and below waist, while maintaining a consistent background and avoiding frame edges. Training guidance: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/custom-avatar
Use uniform lighting, avoid shadows, eliminate green-screen spill, and prevent reflective surfaces. This strengthens segmentation accuracy. Lighting & capture requirements: https://learn.microsoft.com/azure/ai-services/speech-avatar/how-to/preparation
Ensure at least 20–30% of your dataset features controlled hand movement, not only static talking posture to help the model learn arm behavior effectively.
Do not perform abrupt movements from torso to wide positions. Use slow, gradual gestures so the model can maintain stable limb representation.
Capture full upper body with enough side margin so hands never touch frame edges. Camera framing reference: https://learn.microsoft.com/azure/ai-services/speech-avatar/how-to/capture
Azure Custom Avatars focus on facial realism, not full skeletal realism meaning hand detail, occlusion handling, and wide gestures will always have limitations. Model scope: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/limitations

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Custom Avatar Quality not Upto the Mark

1 answer

Your answer