Quickstart: Analyze multimodal content (preview)

The Multimodal API analyzes materials containing both image content and text content to help make applications and services safer from harmful user-generated or AI-generated content. Analyzing an image and its associated text content together can preserve context and provide a more comprehensive understanding of the content.

For more information on the way content is filtered, see the Harm categories concept page. For API input limits, see the Input requirements section of the Overview.

Important

This feature is only available in certain Azure regions. See Region availability.

Prerequisites

  • An Azure subscription - Create one for free
  • Once you have your Azure subscription, create a Content Safety resource in the Azure portal to get your key and endpoint. Enter a unique name for your resource, select your subscription, and select a resource group, supported region, and supported pricing tier. Then select Create.
    • The resource takes a few minutes to deploy. After it finishes, Select go to resource. In the left pane, under Resource Management, select Subscription Key and Endpoint. Copy the endpoint and either of the key values to a temporary location for later use.
  • One of the following installed:

Analyze image with text

The following section walks through a sample multimodal moderation request with cURL.

Prepare a sample image

Choose a sample image to analyze, and download it to your device.

See Input requirements for the image limitations. If your format is animated, the service will extract the first frame to do the analysis.

You can input your image by one of two methods: local filestream or blob storage URL.

  • Local filestream (recommended): Encode your image to base64. You can use a website like codebeautify to do the encoding. Then save the encoded string to a temporary location.
  • Blob storage URL: Upload your image to an Azure Blob Storage account. Follow the blob storage quickstart to learn how to do this. Then open Azure Storage Explorer and get the URL to your image. Save it to a temporary location.

Analyze image with text

Paste the command below into a text editor, and make the following changes.

  1. Replace <endpoint> with your resource endpoint URL.
  2. Replace <your_subscription_key> with your key.
  3. Populate the "image" field in the body with either a "content" field or a "blobUrl" field. For example: {"image": {"content": "<base_64_string>"} or {"image": {"blobUrl": "<your_storage_url>"}.
  4. Optionally replace the value of the "text" field with your own text you'd like to analyze.
curl --location '<endpoint>/contentsafety/imageWithText:analyze?api-version=2024-09-15-preview ' \
--header 'Ocp-Apim-Subscription-Key: <your_subscription_key>' \
--header 'Content-Type: application/json' \
--data '{
  "image": {
      "content": "<base_64_string>"
 },
  "categories": ["Hate","Sexual","Violence","SelfHarm"],
  "enableOcr": true,
  "text": "I want to kill you"
}'

Note

If you're using a blob storage URL, the request body should look like this:

{
  "image": {
    "blobUrl": "<your_storage_url>"
  }
}

The below fields must be included in the URL:

Name Required? Description Type
API Version Required This is the API version to be checked. Current version is: api-version=2024-09-15. Example: <endpoint>/contentsafety/imageWithText:analyze?api-version=2024-09-15 String

The parameters in the request body are defined in this table:

Name Description Type
content or blobUrl (Required) The content or blob URL of the image. I can be either base64-encoded bytes or a blob URL. If both are given, the request is refused. The maximum allowed size of the image is 7,200 x 7,200 pixels, and the maximum file size is 4 MB. The minimum size of the image is 50 pixels x 50 pixels. String
text (Optional) The text attached to the image. We support at most 1000 characters (unicode code points) in one text request. String
enableOcr (Required) When set to true, our service will perform OCR and analyze the detected text with input image at the same time. We will recognize at most 1000 characters (unicode code points) from input image. The others will be truncated. Boolean
categories (Optional) This is assumed to be an array of category names. See the Harm categories guide for a list of available category names. If no categories are specified, all four categories are used. We use multiple categories to get scores in a single request. Enum

Open a command prompt window and run the cURL command.

Output

You should see the image and text moderation results displayed as JSON data in the console. For example:

{
  "categoriesAnalysis": [
    {
      "category": "Hate",
      "severity": 2
    },
    {
      "category": "SelfHarm",
      "severity": 0
    },
    {
      "category": "Sexual",
      "severity": 0
    },
    {
      "category": "Violence",
      "severity": 0
    }
  ]
}

The JSON fields in the output are defined here:

Name Description Type
categoriesAnalysis Each output class that the API predicts. Classification can be multi-labeled. For example, when an image is uploaded to the image moderation model, it could be classified as both sexual content and violence. Harm categories String
Severity The severity level of the flag in each harm category. Harm categories Integer