Why are Content Safety input tokens so expensive? How to minimize them?

Nojus Dziaugys 0 Reputation points
2025-08-25T11:07:09.0133333+00:00

How to track, what the content safety metrics spent the input tokens on? How do I minimize the content safety token costs, as at the moment, they are 10x of all the other costs combined? Is this normal? User's image

Azure AI Content Safety
Azure AI Content Safety
An Azure service that enables users to identify content that is potentially offensive, risky, or otherwise undesirable. Previously known as Azure Content Moderator.
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 26,661 Reputation points Volunteer Moderator
    2025-08-25T16:06:27.1266667+00:00

    Hello Nojus Dziaugys,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are asking why content safety input tokens are so expensive and how to minimize them.

    This issue typically stems from how Azure bills for Content Safety not by OpenAI-style tokens, but by text records, each representing up to 1,000 Unicode characters. If your inputs are long or untrimmed, they can easily multiply into several records per request, driving up costs. According to Azure's pricing, the Standard tier charges $0.38 per 1,000 records, and all safety filters (hate, violence, sexual, self-harm) are applied unless explicitly customized.

    To track what’s consuming these records, you should wrap the Content Safety API with Azure API Management (APIM). This allows you to apply <llm-emit-token-metric> and <llm-token-limit> policies to monitor and control usage. You can then route diagnostics to Azure Monitor or Log Analytics, where you can query usage with KQL like:

    AzureDiagnostics
    | where Resource == "apim-content-safety"
    | project TimeGenerated, clientIP, consumedTokens, remainingTokens, apiName
    

    This setup helps visualize usage per department, model, or client IP, and can be extended into dashboards using Azure Monitor Workbooks or Power BI. For implementation guidance, refer to this APIM integration guide.

    To reduce costs, start by trimming input size remove boilerplate, disclaimers, or irrelevant metadata before sending content. Also, batch short texts into single requests where possible, and disable unnecessary filters if your use case doesn’t require all four categories. For repeated content, implement caching to avoid redundant checks. You can also enforce quotas using <llm-token-limit> to prevent overuse by specific clients or departments.

    If you're testing or prototyping, consider using the free tier, which offers 5,000 records per month at no cost. However, if your usage pattern still results in 10x higher costs, it's a strong signal to audit your integration and optimize both payload design and filter configuration.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

  2. Nikhil Jha (Accenture International Limited) 4,150 Reputation points Microsoft External Staff Moderator
    2025-09-29T05:31:59.5833333+00:00

    Hello Nojus Dziaugys,

    These "Safety Evaluation Input Tokens" are a component of Azure Foundry's integrated safety checks and content moderation. The reason for the increased token usage is that they operate automatically on every request, even if you don't specifically ask for them.

    You can disable "safety evaluation tokens" by unchecking "Risk and Safety" feature as show in screenshot.
    User's image

    To get a better picture of the cost of each chat, look at the token usage in the portal or through SDK logging.

    Thanking Sina for prompt response and helping community. However, I will share a few more methods we can adopt to control cost.

    How to reduce costs:

    1. Avoid redundant checks: Moderate either user input or model output, not both, unless necessary.
    2. Trim inputs: Remove HTML, metadata, disclaimers, or other irrelevant content.
    3. Prefilter content: Apply lightweight local checks (regex, profanity lists, heuristics) and only call Content Safety on ambiguous or high-risk content.
    4. Cache repeated content: For identical messages, reuse previous moderation results.
    5. Sample low-risk traffic: Moderate a subset of messages when appropriate.
    6. Use built-in guardrails: Configure model-level filters to prevent disallowed outputs, reducing the need for post-checking.
    7. Commitment or enterprise tiers: For high-volume use, consider committed plans for cost savings.

    Resources:
    Manage open ai costs
    Azure AI Content Safety FAQ


    If it's helpful, kindly accept and upvote for remediation of other community members.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.