Edit

Share via


What is custom named entity recognition?

Custom named entity recognition (NER) is a cloud-based API service that uses machine learning to help you build models designed for your unique entity recognition requirements. It's one of the specialized features available through Azure AI Language. With custom NER, you can create AI models that extract domain-specific entities from unstructured text, such as contracts or financial documents. When you start a Custom NER project, you can repeatedly label data, train and evaluate your model, and improve its performance before deploying it. The quality of your labeled data is essential, as it directly impacts the model's accuracy.

To simplify building and customizing your model, the service offers a custom web portal that can be accessed through the Language studio. You can easily get started with the service by following the steps in this quickstart.

This documentation contains the following article types:

  • Quickstarts are getting-started instructions to guide you through making requests to the service.
  • Concepts provide explanations of the service functionality and features.
  • How-to guides contain instructions for using the service in more specific or customized ways.

Example usage scenarios

Custom named entity recognition can be used in multiple scenarios across various industries:

Information extraction

Many financial and legal organizations extract and normalize data from thousands of complex, unstructured text sources on a daily basis. Such sources include bank statements, legal agreements, or bank forms. For example, mortgage application data extraction done manually by human reviewers may take several days to extract. Automating these steps by building a custom NER model simplifies the process and saves cost, time, and effort.

Search is foundational to any app that surfaces text content to users. Common scenarios include catalog or document search, retail product search, or knowledge mining for data science. Many enterprises across various industries want to build a rich search experience over private, heterogeneous content, which includes both structured and unstructured documents. As a part of their pipeline, developers can use custom NER for extracting entities from the text that are relevant to their industry. These entities can be used to enrich the indexing of the file for a more customized search experience.

Audit and compliance

Instead of manually reviewing long text files to audit and apply policies, IT departments in financial or legal enterprises can use custom NER to build automated solutions. These solutions can be helpful to enforce compliance policies, and set up necessary business rules based on knowledge mining pipelines that process structured and unstructured content.

Project development lifecycle

Using custom NER typically involves several different steps.

The development lifecycle

  1. Define your schema: Know your data and identify the entities you want extracted. Avoid ambiguity.

  2. Label your data: Labeling data is a key factor in determining model performance. Label precisely, consistently and completely.

    • Label precisely: Label each entity to its right type always. Only include what you want extracted and avoid unnecessary data in your labels.
    • Label consistently: The same entity should have the same label across all the files. Label completely: Label all the instances of the entity in all your files.
  3. Train the model: Your model starts learning from your labeled data.

  4. View the model's performance: After training, review evaluation results and analyze performance for improvement.

  5. Deploy the model: Deploying a model makes it available for use via the Analyze API.

  6. Extract entities: Use your custom models for entity extraction tasks.

Reference documentation and code samples

As you use custom NER, see the following reference documentation and samples for Azure AI Language:

Development option / language Reference documentation Samples
REST APIs (Authoring) REST API documentation
REST APIs (Runtime) REST API documentation
C# (Runtime) C# documentation C# samples
Java (Runtime) Java documentation Java Samples
JavaScript (Runtime) JavaScript documentation JavaScript samples
Python (Runtime) Python documentation Python samples

Responsible AI

An AI system includes not only the technology, but also the people who use it, the people affected by it, and the deployment environment. Read the transparency note to learn about responsible AI use and deployment in your systems. For more information, see the following articles:

Next steps

  • Use the quickstart article to start using custom named entity recognition.

  • As you go through the project development lifecycle, review the glossary to learn more about the terms used throughout the documentation for this feature.

  • Remember to view the service limits for information such as regional availability.