Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
The Azure Language in Foundry Tools Text Personally Identifiable Information (PII) detection anonymization feature (synthetic replacement) is currently available in preview and licensed to you as part of your Azure subscription. Your use of this feature is subject to the terms applicable to Previews as described in the Supplemental Terms of Use for Microsoft Azure Previews and the Microsoft Products and Services Data Protection Addendum (DPA).
Azure Language in Foundry Tools Personally Identifiable Information (PII) detection is a feature offered by Azure Language. The PII detection service is a cloud-based API that utilizes machine learning and AI algorithms to help you develop intelligent applications with advanced natural language understanding. Azure Language PII detection uses Named Entity Recognition (NER) to identify and redact sensitive information from input data. The service classifies sensitive personal data into predefined categories. These categories include phone numbers, email addresses, and identification documents. This classification helps to efficiently detect and eliminate such information.
Tip
Try PII detection in Microsoft Foundry portal. There you can utilize a currently existing Language Studio resource or create a new Foundry resource.
What's new
The 2025-11-15-preview version introduces the following new PII task parameters:
Multiple redaction policies offer the ability to apply various redaction approaches within a single request:
Configurable confidence threshold enables you to set a minimum confidence score. Entities are only included in the output if their confidence score meets or exceeds the specified threshold.
Disable type validation enforcement enables you to bypass the entity type validation. By default, the service enforces validation across multiple entity types to ensure data integrity and minimize false positives. Disabling this enforcement can enhance operational efficiency in cases where strict validation isn't required.
The following entities are available in preview:
- Airport
- DateOfBirth
- BankAccountNumber
- CASocialIdentificationNumber
- CVV (Card Verification Value )
- City
- PassportNumber
- DriversLicenseNumber
- ExpirationDate
- Geopolitical Entity
- KRDriversLicenseNumber
- KRPassportNumber
- KRSocialSecurityNumber
- LicensePlate
- Location
- Password
- SortCode
- State
- USMedicareBeneficiaryId
- VIN (vehicle identification number)
- ZipCode
Conversational PII detection models (both version
2024-11-01-previewandGA) are updated to provide enhanced AI quality and accuracy. The numeric identifier entity type now also includes Drivers License and Medicare Beneficiary Identifier.- As of June 2024, we now provide General Availability support for the Conversational PII service (English-language only).
- Customers can now redact transcripts, chats, and other text written in a conversational style.
- These capabilities provide better confidence in AI quality. They also offer Azure SLA support, production environment support, and enterprise-grade security.
Capabilities
Currently, PII support is available for the following capabilities:
- General text PII detection for processing sensitive information (PII) and health information (PHI) in unstructured text across several predefined categories.
- Conversation PII detection, a specialized model designed to handle speech transcriptions and the informal, conversational tone found in meeting and call transcripts.
- Native Document PII detection for processing structured document files.
Language is a cloud-based service that applies Natural Language Processing (NLP) features to detect categories of personal information (PII) in text-based data. This documentation contains the following types:
- Quickstarts are getting-started instructions to guide you through making requests to the service.
- How-to guides contain instructions for using the service in more specific or customized ways.
Typical workflow
To use this feature, you submit data for analysis and handle the API output in your application. Analysis is performed as-is, with no added customization to the model used on your data.
Create an Azure Language in Foundry Tools resource, which grants you access to the features offered by Language. It generates a password (called a key) and an endpoint URL that you use to authenticate API requests.
Create a request using either the REST API or the client library for C#, Java, JavaScript, and Python. You can also send asynchronous calls with a batch request to combine API requests for multiple features into a single call.
Send the request containing your text data. Your key and endpoint are used for authentication.
Stream or store the response locally.
Key features for text PII
Language offers named entity recognition to identify and categorize information within your text. The feature detects PII categories including names, organizations, addresses, phone numbers, financial account numbers or codes, and government identification numbers. A subset of this PII is protected health information (PHI). By specifying domain=phi in your request, only PHI entities are returned.
Get started with PII detection
To use PII detection, you submit text for analysis and handle the API output in your application. Analysis is performed as-is, with no customization to the model used on your data. There are two ways to use PII detection:
| Development option | Description |
|---|---|
| Microsoft Foundry (new) portal | Foundry (new) is a cloud-based AI platform that provides streamlined access to Foundry models, agents, and tools through Foundry projects. |
| Foundry (classic) portal | Foundry (classic) is a cloud-based platform that supports hub-based projects and other resource types. When you sign up, you can use your own data to detect personally identifying information within text examples. |
| REST API or Client library (Azure SDK) | Integrate PII detection into your applications using the REST API, or the client library available in various languages. |
Reference documentation and code samples
As you use this feature in your applications, see the following reference documentation and samples for Azure Language in Foundry Tools:
| Development option / language | Reference documentation | Samples |
|---|---|---|
| REST API | REST API documentation | |
| C# | C# documentation | C# samples |
| Java | Java documentation | Java Samples |
| JavaScript | JavaScript documentation | JavaScript samples |
| Python | Python documentation | Python samples |
Input requirements and service limits
- Text PII takes text for analysis. For more information, see Data and service limits in the how-to guide.
- PII works with various written languages. For more information, see language support. You can specify in which supported languages your source text is written. If you don't specify a language, the extraction defaults to English. The API may return offsets in the response to support different multilingual and emoji encodings.
Responsible AI
An AI system includes not only the technology, but also the people who use it, the people affected by it, and the deployment environment. Read the transparency note for PII to learn about responsible AI use and deployment in your systems. For more information, see the following articles:
- Transparency note for Azure Language in Foundry Tools
- Integration and responsible use
- Data, privacy, and security
Example scenarios
- Apply sensitivity labels - For example, based on the results from the PII service, a public sensitivity label might be applied to documents where no PII entities are detected. For documents where US addresses and phone numbers are recognized, a confidential label might be applied. A highly confidential label might be used for documents where bank routing numbers are recognized.
- Redact some categories of personal information from documents that get wider circulation - For example, if customer contact records are accessible to frontline support representatives, the company can redact the customer's personal information besides their name from the version of the customer history to preserve the customer's privacy.
- Redact personal information in order to reduce unconscious bias - For example, during a company's resume review process, they can block name, address, and phone number to help reduce unconscious gender or other biases.
- Replace personal information in source data for machine learning to reduce unfairness – For example, if you want to remove names that might reveal gender when training a machine learning model, you could use the service to identify them and you could replace them with generic placeholders for model training.
- Remove personal information from call center transcription – For example, if you want to remove names or other PII data that happen between the agent and the customer in a call center scenario. You could use the service to identify and remove them.
- Data cleaning for data science - PII can be used to make the data ready for data scientists and engineers to be able to use these data to train their machine learning models. Redacting the data to make sure that customer data isn't exposed.
Next steps
There are two ways to get started using the entity linking feature:
- Foundry is a web-based platform that lets you use several Language features without needing to write code.
- The quickstart article for instructions on making requests to the service using the REST API and client library SDK.