Share via

Credential Detection Gaps in Azure Language Service vs. CredScan

KwangJe Cho 0 Reputation points Microsoft Employee
2026-01-21T19:34:29.37+00:00

I am from the DevDiv Data Platform team, and we are currently exploring the Azure AI Language (Cognitive) Service to redact PII and credentials within our telemetry. While we currently use the CredScan SDK with Python libraries, we are looking to migrate to the Azure Language Service for scalability reasons.

During our evaluation, we found that several credentials successfully redacted by CredScan are not handled correctly by the Azure Language Service. We are reaching out to see if your team has a plan to consistently update the detection logic to cover a broader range of credentials, or if the service supports custom regex/rules to bridge these gaps.

Our benchmarking revealed that the Azure Language Service missed 19 out of 46 credentials. Key findings from our report include:

Critical Credential Detection Failures

General & Cloud Secrets: Missed X.509 Private Keys, ASP.NET Machine Keys, and AWS Secret Access Keys.

Azure-Specific Keys: Failed to detect Azure Management Certificates, Redis Connection Strings, and Azure Batch Shared Access Keys.

DevOps & CI/CD: Missed GitHub Personal Access Tokens (PATs), NPM Author tokens, and Slack Access Tokens.

Authentication: Failed to identify Web Authentication Cookies (FedAuth) and OAuth Client Secrets.

PII & Clean Text False Positives

We also observed issues with over-redaction where non-sensitive information was incorrectly flagged:

  • Invalid Formats: The service detected a partial date within an invalid IP address and flagged an invalid email TLD as a Person/Organization.
    
  • Business Language: Common phrases like "personally identifiable" and relative time references like "yesterday" were flagged as PII with high confidence.
    

We would appreciate your guidance on any recommendations you have for achieving CredScan-level parity, or if there is a roadmap for these improvements.

Azure Language in Foundry Tools
Azure Language in Foundry Tools
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 27,796 Reputation points Volunteer Moderator
    2026-01-23T11:08:44.76+00:00

    Hello KwangJe Cho,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having Credential Detection Gaps in Azure Language Service vs. CredScan.

    For clarifications and to regarding some core questions:

    Is Azure Language planning to expand credential detection?

    No public roadmap, and model design makes parity with CredScan unlikely.

    Can I add custom regex/rules to Azure Language PII detection?

    Absolutely not. The model is fixed.

    Why were 19/46 secrets missed?

    Because Azure Language is not engineered for secret detection or pattern scanning.

    How do I achieve CredScan‑level coverage?

    You cannot do it with Azure Language alone. You must combine CredScan + your own regex + Azure Language PII.

    Can entity Synonyms help detect tokens or secrets?

    No. They have zero effect on secret detection.

    With the above clarifications and by practical, Azure AI Language provides fixed ML‑ and pattern‑based PII detection and cannot accept custom regex, perform secret scanning, or replace security tools such as CredScan. Its models operate “as‑is” with no rule injection or credential‑level guarantees, as shown in Microsoft’s documentation:

    To achieve reliable protection, run CredScan first to capture all keys, tokens, certificates, and connection strings, then apply your custom regex rules for organization‑specific secret formats, and finally use Azure AI Language strictly for traditional PII like names and emails. This layered approach maximizes recall, minimizes false positives, and ensures each component performs the task it was built for:

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

  2. Anshika Varshney 7,785 Reputation points Microsoft External Staff Moderator
    2026-01-21T22:00:39.6066667+00:00

    Hi KwangJe Cho,

    Thanks for raising this.

    At the moment, PII and credential detection in Azure Language services (including Text Analytics / AI Language) is based on pattern and model-driven detection, not on full semantic or contextual understanding. Because of that, the service can miss credentials in scenarios where:

    • The credential format does not match known or common patterns
    • The value is embedded in free‑form text or custom token structures
    • The credential looks similar to a generic string (e.g., mixed alphanumeric content without clear prefixes)
    • The text is truncated, obfuscated, or lacks clear separators

    This behavior is expected and is documented as part of the limitations of the current PII detection models. The service is designed to minimize false positives, which means some true positives especially edge cases may not be detected automatically.

    A few workarounds you may consider:

    • Combine Azure Language PII detection with custom regex or application‑level validation for known credential formats in your workload.
    • If you have organization‑specific credential patterns (API keys, internal tokens, etc.), handle them outside the built‑in PII categories.
    • For sensitive workflows, consider using multiple layers of validation (PII detection + secret scanning + logging safeguards).
    • Share anonymized examples through your support channel or feedback so the product team can evaluate model improvements.

    Please let me know if there are any remaining questions or additional details, I can help with, I’ll be glad to provide further clarification or guidance.

    Hope this helps!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.