CLI (v2) online endpoint YAML schema

APPLIES TO: Azure CLI ml extension v2 (current)

The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json for managed online endpoint, and at https://azuremlschemas.azureedge.net/latest/kubernetesOnlineEndpoint.schema.json for Kubernetes online endpoint. The differences between managed online endpoint and Kubernetes online endpoint are described in the table of properties in this article. Sample in this article focuses on managed online endpoint.

Note

The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.

Note

A fully specified sample YAML for managed online endpoints is available for reference

YAML syntax

Key Type Description Allowed values Default value
$schema string The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including $schema at the top of your file enables you to invoke schema and resource completions.
name string Required. Name of the endpoint. Needs to be unique at the Azure region level.

Naming rules are defined under endpoint limits.
description string Description of the endpoint.
tags object Dictionary of tags for the endpoint.
auth_mode string The authentication method for invoking the endpoint (data plane operation). Use key for key-based authentication. Use aml_token for Azure Machine Learning token-based authentication. Use aad_token for Microsoft Entra token-based authentication. key, aml_token, aad_token key
compute string Name of the compute target to run the endpoint deployments on. This field is only applicable for endpoint deployments to Azure Arc-enabled Kubernetes clusters (the compute target specified in this field must have type: kubernetes). Don't specify this field if you're doing managed online inference.
identity object The managed identity configuration for accessing Azure resources for endpoint provisioning and inference.
identity.type string The type of managed identity. If the type is user_assigned, the identity.user_assigned_identities property must also be specified. system_assigned, user_assigned
identity.user_assigned_identities array List of fully qualified resource IDs of the user-assigned identities.
traffic object Traffic represents the percentage of requests to be served by different deployments. It's represented by a dictionary of key-value pairs, where keys represent the deployment name and value represent the percentage of traffic to that deployment. For example, blue: 90 green: 10 means 90% requests are sent to the deployment named blue and 10% is sent to deployment green. Total traffic has to either be 0 or sum up to 100. See Safe rollout for online endpoints to see the traffic configuration in action.

Note: you can't set this field during online endpoint creation, as the deployments under that endpoint must be created before traffic can be set. You can update the traffic for an online endpoint after the deployments have been created using az ml online-endpoint update; for example, az ml online-endpoint update --name <endpoint_name> --traffic "blue=90 green=10".
public_network_access string This flag controls the visibility of the managed endpoint. When disabled, inbound scoring requests are received using the private endpoint of the Azure Machine Learning workspace and the endpoint can't be reached from public networks. This flag is applicable only for managed endpoints enabled, disabled enabled
mirror_traffic string Percentage of live traffic to mirror to a deployment. Mirroring traffic doesn't change the results returned to clients. The mirrored percentage of traffic is copied and submitted to the specified deployment so you can gather metrics and logging without impacting clients. For example, to check if latency is within acceptable bounds and that there are no HTTP errors. It's represented by a dictionary with a single key-value pair, where the key represents the deployment name and the value represents the percentage of traffic to mirror to the deployment. For more information, see Test a deployment with mirrored traffic.

Remarks

The az ml online-endpoint commands can be used for managing Azure Machine Learning online endpoints.

Examples

Examples are available in the examples GitHub repository. Several are shown below.

YAML: basic

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key

YAML: system-assigned identity

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-sai-endpoint
auth_mode: key

YAML: user-assigned identity

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-uai-endpoint
auth_mode: key
identity:
  type: user_assigned
  user_assigned_identities:
    - resource_id: user_identity_ARM_id_place_holder

Next steps