Поделиться через


Enforce content safety checks on LLM requests

ОБЛАСТЬ ПРИМЕНЕНИЯ: Разработчик | Базовый | Базовая версия 2 | Стандартный | Стандартная версия 2 | Премиум | Премиум версия 2

The llm-content-safety policy enforces content safety checks on large language model (LLM) requests (prompts) by transmitting them to the Azure AI Content Safety service before sending to the backend LLM API. When the policy is enabled and Azure AI Content Safety detects malicious content, API Management blocks the request and returns a 403 error code.

Use the policy in scenarios such as the following:

  • Block requests that contain predefined categories of harmful content or hate speech
  • Apply custom blocklists to prevent specific content from being sent
  • Shield against prompts that match attack patterns

Примечание.

Set the policy's elements and child elements in the order provided in the policy statement. Learn more about how to set or edit API Management policies.

Предпосылки

  • An Azure AI Content Safety resource.
  • An API Management backend configured to route content safety API calls and authenticate to the Azure AI Content Safety service, in the form https://<content-safety-service-name>.cognitiveservices.azure.com. Managed identity with Cognitive Services User role is recommended for authentication.

Policy statement

<llm-content-safety backend-id="name of backend entity" shield-prompt="true | false" >
    <categories output-type="FourSeverityLevels | EightSeverityLevels">
        <category name="Hate | SelfHarm | Sexual | Violence" threshold="integer" />
        <!-- If there are multiple categories, add more category elements -->
        [...]
    </categories>
    <blocklists>
        <id>blocklist-identifier</id>
        <!-- If there are multiple blocklists, add more id elements -->
        [...]
    </blocklists>
</llm-content-safety>

Атрибуты

Атрибут Описание Обязательно По умолчанию
backend-id Identifier (name) of the Azure AI Content Safety backend to route content-safety API calls to. Policy expressions are allowed. Да Не применимо
shield-prompt If set to true, content is checked for user attacks. Otherwise, skip this check. Policy expressions are allowed. нет false

Элементы

Элемент Описание Обязательно
категории A list of category elements that specify settings for blocking requests when the category is detected. нет
blocklists A list of blocklistid elements from the Azure AI Content Safety instance for which detection causes the request to be blocked. Policy expressions are allowed. нет

categories attributes

Атрибут Описание Обязательно По умолчанию
output-type Specifies how severity levels are returned by Azure AI Content Safety. The attribute must have one of the following values.

- FourSeverityLevels: Output severities in four levels: 0,2,4,6.
- EightSeverityLevels: Output severities in eight levels: 0,1,2,3,4,5,6,7.

Policy expressions are allowed.
нет FourSeverityLevels

category attributes

Атрибут Описание Обязательно По умолчанию
имя Specifies the name of this category. The attribute must have one of the following values: Hate, SelfHarm, Sexual, Violence. Policy expressions are allowed. Да Не применимо
порог Specifies the threshold value for this category at which request are blocked. Requests with content severities less than the threshold aren't blocked. The value must be between 0 and 7. Policy expressions are allowed. Да Не применимо

Использование

Заметки об использовании

  • The policy runs on a concatenation of all text content in a completion or chat completion request.
  • If the request exceeds the character limit of Azure AI Content Safety, a 403 error is returned.
  • This policy can be used multiple times per policy definition.

Пример

The following example enforces content safety checks on LLM requests using the Azure AI Content Safety service. The policy blocks requests that contain speech in the Hate or Violence category with a severity level of 4 or higher. The shield-prompt attribute is set to true to check for adversarial attacks.

<policies>
    <inbound>
        <llm-content-safety backend-id="content-safety-backend" shield-prompt="true">
            <categories output-type="EightSeverityLevels">
                <category name="Hate" threshold="4" />
                <category name="Violence" threshold="4" />
            </categories>
        </llm-content-safety>
    </inbound>
</policies>

Дополнительные сведения о работе с политиками см. в нижеуказанных статьях.