Share via

Azure content studio does not detect harmful content.

Vladimirs Davidovs 0 Reputation points Microsoft Employee
2026-04-14T06:17:06.4166667+00:00

I was going through the Operationalize AI responsibly with Azure AI Foundry - Training | Microsoft Learn learning path and it has some exercises, using code and data from this repo - https://github.com/Azure-Samples/RAI-workshops.git. Besides the fact that the learning path and the repo code are somewhat outdated, I have identified the following problem - in the exercise, that shows how to moderate images, while trying to test this functionality, I have used one of the images, provided in the repo, that depicts animal on human violence and the test returned severity level as 'Safe' across all categories, which is obviously wrong. After trying this in UI I went ahead and did the same test via the APIs, which returned the same result. Now the questions are:

  • Is there something wrong with the Content Safety Studio and/or the model it is using to detect harmful content?
  • Where to report this?
  • How to deal with situations like this when developing real-world systems?
Azure AI Content Safety
Azure AI Content Safety

An Azure service that enables users to identify content that is potentially offensive, risky, or otherwise undesirable. Previously known as Azure Content Moderator.


1 answer

Sort by: Most helpful
  1. Alex Burlachenko 20,425 Reputation points MVP Volunteer Moderator
    2026-04-14T07:18:26.6433333+00:00

    hi Vladimirs Davidovs,

    quick answer model didnt cross threshold so marked Safe, known limitation, tune thresholds + add extra validation layers.

    yeah this is real and happens, not u doing something wrong, this is model limitation + policy gap + outdated sample mismatch.

    Content Safety image model is probabilistic classifier, not deterministic, so edge cases like animal-on-human violence can slip as Safe depending on training coverage and thresholds, especially if model focuses more on human-to-human violence signals, also repo is outdated so examples may not align with current model versions and taxonomy.

    API and Studio both use same backend so same result expected, technically whats going on is score per category (violence, self-harm etc) stays below threshold so classification = Safe not bc model sees nothing but bc confidence < cutoff, u can inspect raw scores via API (not just label) and u will see low but non-zero values, thats key for tuning where to report GitHub repo issues (for sample problems) + Azure feedback/support for model false negatives, real-world fix is never rely on single pass moderation.

    Use multi-layer approach adjust thresholds (lower cutoff), add custom classifiers or CV models for domain-specific violence, run secondary validation (ensemble), and add human-in-the-loop for borderline cases, also log scores not just labels and build own decision logic

    rgds,

    Alex

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.