Edit

Share via


What is Azure Language PII detection?

Important

The Azure Language in Foundry Tools Text Personally Identifiable Information (PII) detection anonymization feature (synthetic replacement) is currently available in preview and licensed to you as part of your Azure subscription. Your use of this feature is subject to the terms applicable to Previews as described in the Supplemental Terms of Use for Microsoft Azure Previews and the Microsoft Products and Services Data Protection Addendum (DPA).

Azure Language in Foundry Tools Personally Identifiable Information (PII) detection is a feature offered by Azure Language. The PII detection service is a cloud-based API that utilizes machine learning and AI algorithms to help you develop intelligent applications with advanced natural language understanding. Azure Language PII detection uses Named Entity Recognition (NER) to identify and redact sensitive information from input data. The service classifies sensitive personal data into predefined categories. These categories include phone numbers, email addresses, and identification documents. This classification helps to efficiently detect and eliminate such information.

What's new

The 2025-11-15-preview version introduces the following new PII task parameters:

Capabilities

Currently, PII support is available for the following capabilities:

Language is a cloud-based service that applies Natural Language Processing (NLP) features to detect categories of personal information (PII) in text-based data. This documentation contains the following types:

  • Quickstarts are getting-started instructions to guide you through making requests to the service.
  • How-to guides contain instructions for using the service in more specific or customized ways.

Typical workflow

To use this feature, you submit data for analysis and handle the API output in your application. Analysis is performed as-is, with no added customization to the model used on your data.

  1. Create an Azure Language in Foundry Tools resource, which grants you access to the features offered by Language. It generates a password (called a key) and an endpoint URL that you use to authenticate API requests.

  2. Create a request using either the REST API or the client library for C#, Java, JavaScript, and Python. You can also send asynchronous calls with a batch request to combine API requests for multiple features into a single call.

  3. Send the request containing your text data. Your key and endpoint are used for authentication.

  4. Stream or store the response locally.

Key features for text PII

Language offers named entity recognition to identify and categorize information within your text. The feature detects PII categories including names, organizations, addresses, phone numbers, financial account numbers or codes, and government identification numbers. A subset of this PII is protected health information (PHI). By specifying domain=phi in your request, only PHI entities are returned.

Get started with PII detection

To use PII detection, you submit text for analysis and handle the API output in your application. Analysis is performed as-is, with no customization to the model used on your data. There are two ways to use PII detection:

Development option Description
Microsoft Foundry (new) portal Foundry (new) is a cloud-based AI platform that provides streamlined access to Foundry models, agents, and tools through Foundry projects.
Foundry (classic) portal Foundry (classic) is a cloud-based platform that supports hub-based projects and other resource types. When you sign up, you can use your own data to detect personally identifying information within text examples.
REST API or Client library (Azure SDK) Integrate PII detection into your applications using the REST API, or the client library available in various languages.

Reference documentation and code samples

As you use this feature in your applications, see the following reference documentation and samples for Azure Language in Foundry Tools:

Development option / language Reference documentation Samples
REST API REST API documentation
C# C# documentation C# samples
Java Java documentation Java Samples
JavaScript JavaScript documentation JavaScript samples
Python Python documentation Python samples

Input requirements and service limits

Responsible AI

An AI system includes not only the technology, but also the people who use it, the people affected by it, and the deployment environment. Read the transparency note for PII to learn about responsible AI use and deployment in your systems. For more information, see the following articles:

Example scenarios

  • Apply sensitivity labels - For example, based on the results from the PII service, a public sensitivity label might be applied to documents where no PII entities are detected. For documents where US addresses and phone numbers are recognized, a confidential label might be applied. A highly confidential label might be used for documents where bank routing numbers are recognized.
  • Redact some categories of personal information from documents that get wider circulation - For example, if customer contact records are accessible to frontline support representatives, the company can redact the customer's personal information besides their name from the version of the customer history to preserve the customer's privacy.
  • Redact personal information in order to reduce unconscious bias - For example, during a company's resume review process, they can block name, address, and phone number to help reduce unconscious gender or other biases.
  • Replace personal information in source data for machine learning to reduce unfairness – For example, if you want to remove names that might reveal gender when training a machine learning model, you could use the service to identify them and you could replace them with generic placeholders for model training.
  • Remove personal information from call center transcription – For example, if you want to remove names or other PII data that happen between the agent and the customer in a call center scenario. You could use the service to identify and remove them.
  • Data cleaning for data science - PII can be used to make the data ready for data scientists and engineers to be able to use these data to train their machine learning models. Redacting the data to make sure that customer data isn't exposed.

Next steps

There are two ways to get started using the entity linking feature:

  • Foundry is a web-based platform that lets you use several Language features without needing to write code.
  • The quickstart article for instructions on making requests to the service using the REST API and client library SDK.