Azure content studio does not detect harmful content.

Question

Azure content studio does not detect harmful content.

Vladimirs Davidovs 0 Microsoft Employee

I was going through the Operationalize AI responsibly with Azure AI Foundry - Training | Microsoft Learn learning path and it has some exercises, using code and data from this repo - https://github.com/Azure-Samples/RAI-workshops.git. Besides the fact that the learning path and the repo code are somewhat outdated, I have identified the following problem - in the exercise, that shows how to moderate images, while trying to test this functionality, I have used one of the images, provided in the repo, that depicts animal on human violence and the test returned severity level as 'Safe' across all categories, which is obviously wrong. After trying this in UI I went ahead and did the same test via the APIs, which returned the same result. Now the questions are:

Is there something wrong with the Content Safety Studio and/or the model it is using to detect harmful content?
Where to report this?
How to deal with situations like this when developing real-world systems?

Anshika Varshney 9,740 Reputation points Microsoft External Staff Moderator

2026-04-14T07:17:08.2233333+00:00

Hi Vladimirs Davidovs,

Thanks for bringing this up. This kind of behavior can actually happen sometimes when testing harmful content detection in Azure AI Content Safety Studio.

Azure Content Safety works based on harm categories and severity levels. It checks content across four main categories which are hate, sexual, violence and self-harm.

Each category is given a severity score depending on how risky the content is. If the severity score stays below the configured threshold level, the system may mark the content as Safe even if it looks harmful to a human reviewer.

So, in cases like animal on human violence or indirect harm, the model might assign a low severity score which does not cross the safety cutoff value. Because of this, the final classification returned by both Studio and API can still appear as Safe.

Also, please note that Content Safety Studio and the API both use the same backend moderation models. So, testing the same input through UI and API will return the same moderation result.

You can try the following steps to improve detection for real world scenarios:

Check the raw severity scores returned in the API response instead of relying only on the Safe or Unsafe label. Adjust your moderation threshold level in Content Safety Studio and test again with lower cutoff values. Use custom content filtering or custom categories to match your business requirements. Avoid relying on single pass moderation and consider adding an extra validation layer in your application logic.

You can also review how moderation works and experiment with threshold levels here: https://ai.azure.com/explore/contentsafety [ai.azure.com]

And refer to the official harm category guidance here: https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/harm-categories [learn.microsoft.com]

Azure AI Content Safety is designed to analyze both text and image content and assign severity levels based on context and semantics. [azure.github.io]

So tuning thresholds and moderation logic is an important step while building production level systems.

Hope this helps. Thankyou!
Anshika Varshney 9,740 Reputation points Microsoft External Staff Moderator

2026-04-16T07:55:04.58+00:00

Hi Vladimirs Davidovs,

Thanks for sharing this update in private chat, and I appreciate you taking the time to validate the points internally with the Content Safety Studio PM.

Just to clarify the intent behind the earlier responses, they were based on the current documented behavior of Content Safety Studio and how detections are applied depending on configuration, policy selection, and evaluation context. That said, feedback from the product team perspective is definitely valuable, especially for edge cases or gaps between expected and observed behavior.

Once you hear back from the PM and have more clarity from their team, it would be helpful to understand whether the behavior you’re seeing is an expected limitation, a configuration nuance, or something the product team is actively reviewing. That context will help ensure the discussion here reflects the most accurate and up‑to‑date guidance.

Looking forward to your follow‑up once you’ve had that conversation, please let me know if there are any remaining questions or additional details, I can help with, I’ll be glad to provide further clarification or guidance.

Thank you!

1 answer

Your answer

Anshika Varshney 9,740 Reputation points Microsoft External Staff Moderator

2026-04-16T07:55:04.58+00:00

Hi Vladimirs Davidovs,

Thanks for sharing this update in private chat, and I appreciate you taking the time to validate the points internally with the Content Safety Studio PM.

Just to clarify the intent behind the earlier responses, they were based on the current documented behavior of Content Safety Studio and how detections are applied depending on configuration, policy selection, and evaluation context. That said, feedback from the product team perspective is definitely valuable, especially for edge cases or gaps between expected and observed behavior.

Once you hear back from the PM and have more clarity from their team, it would be helpful to understand whether the behavior you’re seeing is an expected limitation, a configuration nuance, or something the product team is actively reviewing. That context will help ensure the discussion here reflects the most accurate and up‑to‑date guidance.

Looking forward to your follow‑up once you’ve had that conversation, please let me know if there are any remaining questions or additional details, I can help with, I’ll be glad to provide further clarification or guidance.

Thank you!

Answer 1

hi Vladimirs Davidovs,

quick answer model didnt cross threshold so marked Safe, known limitation, tune thresholds + add extra validation layers.

yeah this is real and happens, not u doing something wrong, this is model limitation + policy gap + outdated sample mismatch.

Content Safety image model is probabilistic classifier, not deterministic, so edge cases like animal-on-human violence can slip as Safe depending on training coverage and thresholds, especially if model focuses more on human-to-human violence signals, also repo is outdated so examples may not align with current model versions and taxonomy.

API and Studio both use same backend so same result expected, technically whats going on is score per category (violence, self-harm etc) stays below threshold so classification = Safe not bc model sees nothing but bc confidence < cutoff, u can inspect raw scores via API (not just label) and u will see low but non-zero values, thats key for tuning where to report GitHub repo issues (for sample problems) + Azure feedback/support for model false negatives, real-world fix is never rely on single pass moderation.

Use multi-layer approach adjust thresholds (lower cutoff), add custom classifiers or CV models for domain-specific violence, run secondary validation (ensemble), and add human-in-the-loop for borderline cases, also log scores not just labels and build own decision logic

rgds,

Alex

Share via

Azure content studio does not detect harmful content.

1 answer

Your answer