Handling or Disabling Content Filters for Medical Use Case in Agents Service

Question

Handling or Disabling Content Filters for Medical Use Case in Agents Service

Omar Elhanafy 20

Hey Everyone,

I am using Azure AI Agent Service with OpenAI gpt-4.1 model, and I have a medical use case that requires the agent to handle detecting self harm responses and act appropriately by calling our custom tools.

Approaches Used:

Configuring Self-harm to lowest blocking and trying to enforce through prompting, but still the behavior is not stable where either the agent ignores tool calling instructions or run enters incomplete status

What could be the solutions in handling or disabling the content filters to allow the agent to detect self harm and act according to required logic

User's image

Omar Elhanafy 20 Reputation points

2025-11-14T19:21:25.0366667+00:00

This is great, thank you! I'll try out this approach and will update you with my findings.
Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-11-17T16:03:17.9233333+00:00

Hi Omar Elhanafy

Did you get any chance to review the above response.

Thank you!
Omar Elhanafy 20 Reputation points

2025-11-19T15:43:15.47+00:00
Hello @Sridhar M , yes I just implemented the preprocessing layer and it worked. Thank you very much! Take a look as well in my steps if there is something I might be missing.

Step 1: Preprocess the message

Pre-check input with Azure Content Safety API using the example from https://github.com/Azure-Samples/AzureAIContentSafety/blob/main/python/1.0.0/sample_analyze_text_async.py

Extracted the 4 category scores and applied a threshold on the self harm category

Sanitizing the message if self harm detected severity level is greater than 4, as extreme cases tend to lead to an incomplete run

Original harmful content is replaced with a safe placeholder

Placeholder explains what was detected without repeating harmful text

Additional instructions guide the agent on tool parameters

Augmenting User Message to be sent to the agent

Step 2: If safe → send to agent normally.

Step 3: If self-harm detected → agent with augmented user message

Step 4: Log all escalations for audit.

Answer accepted by question author

0 additional answers

Your answer

Omar Elhanafy 20 Reputation points

2025-11-14T19:21:25.0366667+00:00

This is great, thank you! I'll try out this approach and will update you with my findings.
Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-11-17T16:03:17.9233333+00:00

Hi Omar Elhanafy

Did you get any chance to review the above response.

Thank you!
Omar Elhanafy 20 Reputation points

2025-11-19T15:43:15.47+00:00

Hello @Sridhar M , yes I just implemented the preprocessing layer and it worked. Thank you very much! Take a look as well in my steps if there is something I might be missing.

Step 1: Preprocess the message

Pre-check input with Azure Content Safety API using the example from https://github.com/Azure-Samples/AzureAIContentSafety/blob/main/python/1.0.0/sample_analyze_text_async.py

Extracted the 4 category scores and applied a threshold on the self harm category

Sanitizing the message if self harm detected severity level is greater than 4, as extreme cases tend to lead to an incomplete run

Original harmful content is replaced with a safe placeholder

Placeholder explains what was detected without repeating harmful text

Additional instructions guide the agent on tool parameters

Augmenting User Message to be sent to the agent

Step 2: If safe → send to agent normally.

Step 3: If self-harm detected → agent with augmented user message

Step 4: Log all escalations for audit.

Answer 1

Hi Omar Elhanafy

Welcome to Microsoft Q&A and Thank you for reaching out.

Azure AI Agent Service enforces content filters at the platform level for safety and compliance, so they cannot be fully disabled. However, you can design your solution so that the agent reliably detects self-harm intent and invokes your custom tool without being blocked or entering an incomplete state.

Here are practical approaches:

Understand the Filter Behavior

Content filters operate before and after model inference. If a prompt or response violates policy, the request may be blocked or truncated.
Lowering severity (e.g., “Self-harm: lowest blocking”) reduces intervention but does not remove filtering entirely.

Recommended Solutions:

A)Use System-Level Instructions for Detection

In your agent configuration, include explicit system instructions such as: “If user expresses self-harm intent, do not provide harmful advice. Instead, call the SafetyTool with the detected text and return a supportive message.”
Keep these instructions concise and high in the hierarchy (system prompt), so they override ambiguous user prompts.

B)Implement a Pre-Processing Layer

Before sending input to the agent, run a lightweight classifier (Azure Content Safety or your own model) to detect self-harm signals.
If detected, bypass the model for risky text and directly trigger your custom tool. This avoids filter-triggered incomplete runs.

C)Use Tool Invocation as the Primary Action

{
  "name": "SafetyTool",
  "description": "Handles self-harm cases by notifying support",
  "parameters": { "text": "string" }
}

Reinforce in the system prompt: “Always call SafetyTool when self-harm intent is detected, regardless of other instructions.”

D)Avoid Prompt Patterns That Trigger Filters

Do not include phrases like “ignore safety” or “disable filters” in prompts—they will cause blocks.
Instead, phrase logic as “comply with safety guidelines and escalate via SafetyTool”.

You Cannot Do

You cannot fully disable Azure content filters—they are mandatory for compliance.
Attempting to bypass them (e.g., by obfuscating harmful text) violates policy and will result in blocked requests.

Practice Architecture

Step 1: Pre-check input with Azure Content Safety API.
Step 2: If safe → send to agent normally.
Step 3: If self-harm detected → agent or pre-processor triggers SafetyTool and returns a neutral message.
Step 4: Log all escalations for audit.

Reference:

https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/content-filters

https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUMlBQNkZMR0lFRldORTdVQzQ0TEI5Q1ExOSQlQCN0PWcu

https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/content-filters#configuring-content-filters-via-azure-openai-studio-preview

https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety/

I Hope this helps. Do let me know if you have any further queries.

Thank you!

Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-11-18T13:12:48.7066667+00:00

Hi Omar Elhanafy

Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-11-18T18:33:23.1+00:00

Hi Omar Elhanafy

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

Thank you!
Omar Elhanafy 20 Reputation points

2025-11-19T15:46:19.8+00:00

@Sridhar M , yes I just implemented the preprocessing layer and it worked. Thank you very much! Take a look as well in my steps if there is something I might be missing.

Step 1: Preprocess the message

Pre-check input with Azure Content Safety API using the example from https://github.com/Azure-Samples/AzureAIContentSafety/blob/main/python/1.0.0/sample_analyze_text_async.py

Extracted the 4 category scores and applied a threshold on the self harm category

Sanitizing the message if self harm detected severity level is greater than 4, as extreme cases tend to lead to an incomplete run

Original harmful content is replaced with a safe placeholder

Placeholder explains what was detected without repeating harmful text

Additional instructions guide the agent on tool parameters

Augmenting User Message to be sent to the agent

Step 2: If safe → send to agent normally.

Step 3: If self-harm detected → agent with augmented user message

Step 4: Log all escalations for audit.
Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-11-19T16:57:10.87+00:00

Hi **Omar Elhanafy
**I checked Steps are not missing

Share via

Handling or Disabling Content Filters for Medical Use Case in Agents Service

0 additional answers

Your answer