Share via

Custom Avatar Quality not Upto the Mark

DARSHIL SHAH7 0 Reputation points
2026-03-27T09:49:29.7133333+00:00

I recently create my own custom avatar using Azure Speech Studio.
Even with the training video recordings being done in a studio with a professional camera and a green screen background, when an Avatar video (batch) is generated, the hands often become invisible/merge with the background when moving away from the body.
This happens not just when the hand movements are fast, but also when they are slow.
As soon as the hands move out of the body border, they become invisible/merged with the background
I am not understanding why this is happening even though the quality of training video was good.
Could you guide me on what the possible issue may be and how to resolve this?

Azure AI Speech
Azure AI Speech

An Azure service that integrates speech processing into apps and services.

0 comments No comments

1 answer

Sort by: Most helpful
  1. Sina Salam 28,281 Reputation points Volunteer Moderator
    2026-03-30T01:04:42.6133333+00:00

    Hello DARSHIL SHAH7,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that your Custom Avatar Quality not Upto the Mark.

    The issue isn’t caused by video quality; it results from Azure Avatar model limits handling limb movement outside its learned body zone. The only dependable approach is to retrain with controlled gesture ranges and keep motions inside model‑safe spatial boundaries. Reference: Azure AI Speech Avatars - https://learn.microsoft.com/azure/ai-services/speech-avatar/overview

    Follow the below steps as-is to resolve the issue:

    1. Keep hand motions within torso width and avoid extending arms far from the body silhouette. This matches the model’s supported posture range. Model guidelines: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/avatar-studio
    2. Include training clips showing hands left, right, above shoulders, and below waist, while maintaining a consistent background and avoiding frame edges. Training guidance: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/custom-avatar
    3. Use uniform lighting, avoid shadows, eliminate green-screen spill, and prevent reflective surfaces. This strengthens segmentation accuracy. Lighting & capture requirements: https://learn.microsoft.com/azure/ai-services/speech-avatar/how-to/preparation
    4. Ensure at least 20–30% of your dataset features controlled hand movement, not only static talking posture to help the model learn arm behavior effectively.
    5. Do not perform abrupt movements from torso to wide positions. Use slow, gradual gestures so the model can maintain stable limb representation.
    6. Capture full upper body with enough side margin so hands never touch frame edges. Camera framing reference: https://learn.microsoft.com/azure/ai-services/speech-avatar/how-to/capture
    7. Azure Custom Avatars focus on facial realism, not full skeletal realism meaning hand detail, occlusion handling, and wide gestures will always have limitations. Model scope: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/limitations

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.