Entity Recognition cognitive skill (v3)
The Entity Recognition skill (v3) extracts entities of different types from text. These entities fall under 14 distinct categories, ranging from people and organizations to URLs and phone numbers. This skill uses the Named Entity Recognition machine learning models provided by Azure AI Language.
Note
This skill is bound to Azure AI services and requires a billable resource for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing Azure AI services pay-as-you go price.
@odata.type
Microsoft.Skills.Text.V3.EntityRecognitionSkill
Data limits
The maximum size of a record should be 50,000 characters as measured by String.Length
. If you need to break up your data before sending it to the EntityRecognition skill, consider using the Text Split skill. When using a split skill, set the page length to 5000 for the best performance.
Skill parameters
Parameters are case-sensitive and are all optional.
Parameter name | Description |
---|---|
categories |
Array of categories that should be extracted. Possible category types: "Person" , "Location" , "Organization" , "Quantity" , "DateTime" , "URL" , "Email" , "personType" , "Event" , "Product" , "Skill" , "Address" , "phoneNumber" , "ipAddress" . If no category is provided, all types are returned. |
defaultLanguageCode |
Language code of the input text. If the default language code is not specified, English (en) will be used as the default language code. See the full list of supported languages. Not all entity categories are supported for all languages; see note below. |
minimumPrecision |
A value between 0 and 1. If the confidence score (in the namedEntities output) is lower than this value, the entity is not returned. The default is 0. |
modelVersion |
(Optional) Specifies the version of the model to use when calling the entity recognition API. It will default to the latest available when not specified. We recommend you do not specify this value unless it's necessary. |
Skill inputs
Input name | Description |
---|---|
languageCode |
A string indicating the language of the records. If this parameter is not specified, the default language code will be used to analyze the records. See the full list of supported languages. |
text |
The text to analyze. |
Skill outputs
Note
Not all entity categories are supported for all languages. See Supported Named Entity Recognition (NER) entity categories to know which entity categories are supported for the language you will be using.
Output name | Description |
---|---|
persons |
An array of strings where each string represents the name of a person. |
locations |
An array of strings where each string represents a location. |
organizations |
An array of strings where each string represents an organization. |
quantities |
An array of strings where each string represents a quantity. |
dateTimes |
An array of strings where each string represents a DateTime (as it appears in the text) value. |
urls |
An array of strings where each string represents a URL |
emails |
An array of strings where each string represents an email |
personTypes |
An array of strings where each string represents a PersonType |
events |
An array of strings where each string represents an event |
products |
An array of strings where each string represents a product |
skills |
An array of strings where each string represents a skill |
addresses |
An array of strings where each string represents an address |
phoneNumbers |
An array of strings where each string represents a telephone number |
ipAddresses |
An array of strings where each string represents an IP Address |
namedEntities |
An array of complex types that contains the following fields:
|
Sample definition
{
"@odata.type": "#Microsoft.Skills.Text.V3.EntityRecognitionSkill",
"context": "/document",
"categories": [ "Person", "Email"],
"defaultLanguageCode": "en",
"minimumPrecision": 0.5,
"inputs": [
{
"name": "text",
"source": "/document/content"
},
{
"name": "languageCode",
"source": "/document/language"
}
],
"outputs": [
{
"name": "persons",
"targetName": "people"
},
{
"name": "emails",
"targetName": "emails"
},
{
"name": "namedEntities",
"targetName": "namedEntities"
}
]
}
Sample input
{
"values": [
{
"recordId": "1",
"data":
{
"text": "Contoso Corporation was founded by Jean Martin. They can be reached at [email protected]",
"languageCode": "en"
}
}
]
}
Sample output
{
"values": [
{
"recordId": "1",
"data" :
{
"people": [ "Jean Martin"],
"emails":["[email protected]"],
"namedEntities":
[
{
"category": "Person",
"subcategory": null,
"length": 11,
"offset": 35,
"confidenceScore": 0.98,
"text": "Jean Martin"
},
{
"category": "Email",
"subcategory": null,
"length": 19,
"offset": 71,
"confidenceScore": 0.8,
"text": "[email protected]"
}
],
}
}
]
}
The offsets returned for entities in the output of this skill are directly returned from the Language Service APIs, which means if you are using them to index into the original string, you should use the StringInfo class in .NET in order to extract the correct content. For more information, see Multilingual and emoji support in Language service features.
Warning cases
If the language code for the document is unsupported, a warning is returned and no entities are extracted.