Welcome to Microsoft Q&A and Thank you for reaching out.
Azure AI Agent Service enforces content filters at the platform level for safety and compliance, so they cannot be fully disabled. However, you can design your solution so that the agent reliably detects self-harm intent and invokes your custom tool without being blocked or entering an incomplete state.
Here are practical approaches:
Understand the Filter Behavior
- Content filters operate before and after model inference. If a prompt or response violates policy, the request may be blocked or truncated.
- Lowering severity (e.g., “Self-harm: lowest blocking”) reduces intervention but does not remove filtering entirely.
Recommended Solutions:
A)Use System-Level Instructions for Detection
- In your agent configuration, include explicit system instructions such as: “If user expresses self-harm intent, do not provide harmful advice. Instead, call the
SafetyToolwith the detected text and return a supportive message.” - Keep these instructions concise and high in the hierarchy (system prompt), so they override ambiguous user prompts.
B)Implement a Pre-Processing Layer
- Before sending input to the agent, run a lightweight classifier (Azure Content Safety or your own model) to detect self-harm signals.
- If detected, bypass the model for risky text and directly trigger your custom tool. This avoids filter-triggered incomplete runs.
C)Use Tool Invocation as the Primary Action
{
"name": "SafetyTool",
"description": "Handles self-harm cases by notifying support",
"parameters": { "text": "string" }
}
Reinforce in the system prompt: “Always call SafetyTool when self-harm intent is detected, regardless of other instructions.”
D)Avoid Prompt Patterns That Trigger Filters
- Do not include phrases like “ignore safety” or “disable filters” in prompts—they will cause blocks.
- Instead, phrase logic as “comply with safety guidelines and escalate via SafetyTool”.
You Cannot Do
- You cannot fully disable Azure content filters—they are mandatory for compliance.
- Attempting to bypass them (e.g., by obfuscating harmful text) violates policy and will result in blocked requests.
Practice Architecture
- Step 1: Pre-check input with Azure Content Safety API.
- Step 2: If safe → send to agent normally.
- Step 3: If self-harm detected → agent or pre-processor triggers SafetyTool and returns a neutral message.
- Step 4: Log all escalations for audit.
Reference:
https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/content-filters
https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety/
I Hope this helps. Do let me know if you have any further queries.
Thank you!