Deploy MLflow models to online endpoints

2025-03-31

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, you see how to deploy your MLflow model to an online endpoint for real-time inference. When you deploy your MLflow model to an online endpoint, you don't need to specify a scoring script or an environment—this functionality is known as no-code deployment.

For no-code-deployment, Azure Machine Learning:

Dynamically installs Python packages that you list in a conda.yaml file. As a result, dependencies are installed during container runtime.
Provides an MLflow base image, or curated environment, that contains the following items:
- The azureml-inference-server-http package
- The mlflow-skinny package
- A scoring script for inferencing

Tip

In workspaces without public network access, before you can deploy MLflow models to online endpoints without egress connectivity, you have to package the models. The model packaging capability is in preview. When you package a model, you can avoid the need for an internet connection, which Azure Machine Learning otherwise requires to dynamically install necessary Python packages for the MLflow models.

Prerequisites

An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
A user account that has at least one of the following Azure role-based access control (Azure RBAC) roles:
- An Owner role for the Azure Machine Learning workspace
- A Contributor role for the Azure Machine Learning workspace
- A custom role that has Microsoft.MachineLearningServices/workspaces/onlineEndpoints/* permissions
For more information, see Manage access to Azure Machine Learning workspaces.
Access to Azure Machine Learning:
Install the Azure CLI and the ml extension to the Azure CLI. For installation steps, see Install and set up the CLI (v2).
Install the Azure Machine Learning SDK for Python.
```
pip install azure-ai-ml azure-identity
```
- Install the MLflow SDK package, mlflow, and the Azure Machine Learning integration package for MLflow, azureml-mlflow.
```
pip install mlflow azureml-mlflow
```
- If you don't run code in an Azure Machine Learning compute instance, configure the MLflow tracking URI or the MLflow registry URI to point to the Azure Machine Learning workspace that you work on. For more information about how to connect MLflow to your workspace, see Configure MLflow for Azure Machine Learning.
There are no other prerequisites when you work in Azure Machine Learning studio.

About the example

The example in this article shows you how to deploy an MLflow model to an online endpoint to perform predictions. The example uses an MLflow model that's based on the Diabetes dataset. This dataset contains 10 baseline variables: age, sex, body mass index, average blood pressure, and 6 blood serum measurements obtained from 442 diabetes patients. It also contains the response of interest, a quantitative measure of disease progression one year after the date of the baseline data.

The model was trained by using a scikit-learn regressor. All the required preprocessing is packaged as a pipeline, so this model is an end-to-end pipeline that goes from raw data to predictions.

The information in this article is based on code samples from the azureml-examples repository. If you clone the repository, you can run the commands in this article locally without having to copy or paste YAML files and other files. Use the following commands to clone the repository and go to the folder for your coding language:

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/sdk/python/endpoints/online/mlflow

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/sdk/python/endpoints/online/mlflow

Follow along in Jupyter Notebook

To follow along with the steps in this article, see the Deploy MLflow model to online endpoints notebook in the examples repository.

Connect to your workspace

Connect to your Azure Machine Learning workspace:

az account set --subscription <subscription-ID>
az configure --defaults workspace=<workspace-name> group=<resource-group-name> location=<location>

Import the required libraries:

from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment,
Model,
Environment,
CodeConfiguration,
)
from azure.identity import DefaultAzureCredential
from azure.ai.ml.constants import AssetTypes

Configure workspace details and get a handle to the workspace:

subscription_id = "<subscription-ID>"
resource_group = "<resource-group-name>"
workspace = "<workspace-name>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

Import the required libraries:

import json
import mlflow
import requests
import pandas as pd
from mlflow.deployments import get_deploy_client
from mlflow.tracking import MlflowClient

Initialize the MLflow client:
```
mlflow_client = MlflowClient()
```

Configure the deployment client:

deployment_client = get_deploy_client(mlflow.get_tracking_uri())

Register the model

You can deploy only registered models to online endpoints. The steps in this article use a model that's trained for the Diabetes dataset. In this case, you already have a local copy of the model in your cloned repository, so you only need to publish the model to the registry in the workspace. You can skip this step if the model you want to deploy is already registered.

MODEL_NAME='sklearn-diabetes'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "endpoints/online/ncd/sklearn-diabetes/model"

model_name = 'sklearn-diabetes'
model_local_path = "sklearn-diabetes/model"
model = ml_client.models.create_or_update(
        Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)

model_name = 'sklearn-diabetes'
model_local_path = "sklearn-diabetes/model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

What if your model was logged inside a run?

If your model was logged inside a run, you can register it directly.

To register the model, you need to know its storage location:

If you use the MLflow autolog feature, the path to the model depends on the model type and framework. Check the job output to identify the name of the model folder. This folder contains a file named MLModel.
If you use the log_model method to manually log your models, you pass the path to the model as an argument to that method. For example, if you use mlflow.sklearn.log_model(my_model, "classifier") to log the model, classifier is the path that the model is stored on.

You can use the Azure Machine Learning CLI v2 to create a model from training job output. The following code uses the artifacts of a job with ID $RUN_ID to register a model named $MODEL_NAME. $MODEL_PATH is the path that the job uses to store the model.

az ml model create --name $MODEL_NAME --path azureml://jobs/$RUN_ID/outputs/artifacts/$MODEL_PATH

You can use the Python SDK to create a model from training job output. The following code uses the artifacts of a job with ID RUN_ID to register a model named sklearn-diabetes. MODEL_PATH is the path that the job uses to store the model.

model_name = 'sklearn-diabetes'

ml_client.models.create_or_update(
    Model(
        path=f"azureml://jobs/{RUN_ID}/outputs/artifacts/{MODEL_PATH}",
        name=model_name,
        type=AssetTypes.MLFLOW_MODEL
    )
)

You can use the Python MLflow SDK to create a model from training job output. The following code uses the artifacts of a job with ID RUN_ID to register a model named sklearn-diabetes. MODEL_PATH is the path that the job uses to store the model.

model_name = 'sklearn-diabetes'

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"runs://{RUN_ID}/{MODEL_PATH}"
)
version = registered_model.version

Deploy an MLflow model to an online endpoint

Use the following code to configure the name and authentication mode of the endpoint that you want to deploy the model to:

Set an endpoint name by running the following command. First replace YOUR_ENDPOINT_NAME with a unique name.

export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"

To configure your endpoint, create a YAML file named create-endpoint.yaml that contains the following lines:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key

# To create a unique endpoint name, use a time stamp of the current date and time.
import datetime

endpoint_name = "sklearn-diabetes-" + datetime.datetime.now().strftime("%m%d%H%M%f")

endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    description="An online endpoint to generate predictions for the diabetes dataset",
    auth_mode="key",
    tags={"env": "dev"},
)

You can use a configuration file to configure the properties of the endpoint. In this case, you configure the authentication mode of the endpoint to be key.


# To create a unique endpoint name, use a time stamp of the current date and time.
import datetime

endpoint_name = "sklearn-diabetes-" + datetime.datetime.now().strftime("%m%d%H%M%f")

endpoint_config = {
    "auth_mode": "key",
    "identity": {
        "type": "system_assigned"
    }
}

Use the following code to write this configuration information to a JSON file:

endpoint_config_path = "endpoint_config.json"
with open(endpoint_config_path, "w") as outfile:
    outfile.write(json.dumps(endpoint_config))

Create the endpoint:

az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/ncd/create-endpoint.yaml

ml_client.begin_create_or_update(endpoint)

endpoint = deployment_client.create_endpoint(
    name=endpoint_name,
    config={"endpoint-config-file": endpoint_config_path},
)

Configure the deployment. A deployment is a set of resources required for hosting the model that does the actual inferencing.

Create a YAML file named sklearn-deployment.yaml that contains the following lines:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: sklearn-deployment
endpoint_name: my-endpoint
model:
  name: mir-sample-sklearn-ncd-model
  version: 2
  path: sklearn-diabetes/model
  type: mlflow_model
instance_type: Standard_DS3_v2
instance_count: 1

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_F4s_v2",
    instance_count=1
)

Alternatively, if your endpoint doesn't have egress connectivity, use model packaging (preview) by including the argument with_package=True:

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_F4s_v2",
    instance_count=1,
    with_package=True,
)

blue_deployment_name = "blue"

To configure the hardware requirements of your deployment, create a JSON file with the desired configuration:

deploy_config = {
    "instance_type": "Standard_F4s_v2",
    "instance_count": 1,
}

Note

For information about the full specification of this configuration, see CLI (v2) managed online deployment YAML schema.

Use the following code to write the configuration to a file:

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

Note

Automatic generation of the scoring_script and environment is only supported for the PyFunc model flavor. To use a different model flavor, see Customize MLflow model deployments.

Create the deployment:
```
az ml online-deployment create --name sklearn-deployment --endpoint $ENDPOINT_NAME -f endpoints/online/ncd/sklearn-deployment.yaml --all-traffic
```
If your endpoint doesn't have egress connectivity, use model packaging (preview) by including the flag --package-model:
```
az ml online-deployment create --package-model --name sklearn-deployment --endpoint $ENDPOINT_NAME -f endpoints/online/ncd/sklearn-deployment.yaml --all-traffic
```
```
ml_client.online_deployments.begin_create_or_update(blue_deployment)
```
```
blue_deployment = deployment_client.create_deployment(
    name=blue_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)    
```
1. Select Endpoints. Go to the Real-time endpoints tab, and then select Create.
2. Select the MLflow model that you registered previously, and then select Select.
  
  Note
  
  The configuration page includes a note to inform you that the scoring script and environment are automatically generated for your selected MLflow model.
3. Under Endpoint, select New to deploy to a new endpoint.
4. Under Endpoint name, enter a name for the endpoint or keep the default name.
5. Under Deployment name, enter a name for the deployment or keep the default name.
6. Select Deploy to deploy the model to the endpoint.
Assign all the traffic to the deployment. So far, the endpoint has one deployment, but none of its traffic is assigned to it.
This step isn't required in the Azure CLI if you use the --all-traffic flag during creation. If you need to change the traffic, you can use the az ml online-endpoint update --traffic command. For more information about how to update traffic, see Progressively update the traffic.
```
endpoint.traffic = {"blue": 100}
```
```
traffic_config = {"traffic": {blue_deployment_name: 100}}
```
Write the configuration to a file:
```
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))
```
This step isn't required in the studio.
Update the endpoint configuration:
This step isn't required in the Azure CLI if you use the --all-traffic flag during creation. If you need to change traffic, you can use the az ml online-endpoint update --traffic command. For more information about how to update traffic, see Progressively update the traffic.
```
ml_client.begin_create_or_update(endpoint).result()
```
```
deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)
```
This step isn't required in the studio.

Invoke the endpoint

When your deployment is ready, you can use it to serve requests. One way to test the deployment is by using the built-in invocation capability in your deployment client. In the examples repository, the sample-request-sklearn.json file contains the following JSON code. You can use it as a sample request file for the deployment.

{"input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
      [ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
    ],
    "index": [0,1]
  }}

{
  "input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ]
    ],
    "index": [0]
  }
}

{
  "input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ]
    ],
    "index": [0]
  }
}

{"input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
      [ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
    ],
    "index": [0,1]
  }}

Note

This file uses the input_data key instead of inputs, which MLflow serving uses. Azure Machine Learning requires a different input format to be able to automatically generate the Swagger contracts for the endpoints. For more information about expected input formats, see Deployment in the MLflow built-in server vs. deployment in Azure Machine Learning inferencing server.

Submit a request to the endpoint:

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/ncd/sample-request-sklearn.json

response = ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="sample-request-sklearn.json",
)

# Read the sample request that's in the JSON file, and then construct a pandas data frame.
with open("sample-request-sklearn.json", "r") as f:
    sample_request = json.loads(f.read())
    samples = pd.DataFrame(**sample_request["input_data"])

deployment_client.predict(endpoint=endpoint_name, df=samples)

The response should be similar to the following text:

[ 
  11633.100167144921,
  8522.117402884991
]

[ 
  11633.100167144921
]

[ 
  11633.100167144921
]

[ 
  11633.100167144921,
  8522.117402884991
]

Important

For MLflow no-code-deployment, testing via local endpoints isn't currently supported.

Customize MLflow model deployments

You don't have to specify a scoring script in the deployment definition of an MLflow model to an online endpoint. But you can specify a scoring script if you want to customize your inference process.

You typically want to customize your MLflow model deployment in the following cases:

The model doesn't have a PyFunc flavor.
You need to customize the way you run the model. For instance, you need to use mlflow.<flavor>.load_model() to use a specific flavor to load the model.
You need to do preprocessing or postprocessing in your scoring routine, because the model doesn't do this processing.
The output of the model can't be nicely represented in tabular data. For instance, the output is a tensor that represents an image.

Important

If you specify a scoring script for an MLflow model deployment, you also have to specify the environment that the deployment runs in.

Deploy a custom scoring script

To deploy an MLflow model that uses a custom scoring script, take the steps in the following sections.

Identify the model folder

Identify the folder that contains your MLflow model by taking the following steps:

Go to Azure Machine Learning studio.
Go to the Models section.
Select the model that you want to deploy and go to its Artifacts tab.
Take note of the folder that's displayed. When you register a model, you specify this folder.

Create a scoring script

The following scoring script, score.py, provides an example of how to perform inference with an MLflow model. You can adapt this script to your needs or change any of its parts to reflect your scenario. Notice that the folder name that you previously identified, model, is included in the init() function.

import logging
import os
import json
import mlflow
from io import StringIO
from mlflow.pyfunc.scoring_server import infer_and_parse_json_input, predictions_to_json


def init():
    global model
    global input_schema
    # "model" is the path of the mlflow artifacts when the model was registered. For automl
    # models, this is generally "mlflow-model".
    model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model")
    model = mlflow.pyfunc.load_model(model_path)
    input_schema = model.metadata.get_input_schema()


def run(raw_data):
    json_data = json.loads(raw_data)
    if "input_data" not in json_data.keys():
        raise Exception("Request must contain a top level key named 'input_data'")

    serving_input = json.dumps(json_data["input_data"])
    data = infer_and_parse_json_input(serving_input, input_schema)
    predictions = model.predict(data)

    result = StringIO()
    predictions_to_json(predictions, result)
    return result.getvalue()

Warning

MLflow 2.0 advisory: The example scoring script works with MLflow 1.X and MLflow 2.X. However, the expected input and output formats on those versions can vary. Check your environment definition to see which MLflow version you use. MLflow 2.0 is only supported in Python 3.8 and later versions.

Create an environment

The next step is to create an environment that you can run the scoring script in. Because the model is an MLflow model, the conda requirements are also specified in the model package. For more information about the files included in an MLflow model, see The MLmodel format. You build the environment by using the conda dependencies from the file. However, you need to also include the azureml-inference-server-http package, which is required for online deployments in Azure Machine Learning.

You can create a conda definition file named conda.yaml that contains the following lines:

channels:
- conda-forge
dependencies:
- python=3.12
- pip
- pip:
  - mlflow
  - scikit-learn==1.7.0
  - cloudpickle==3.1.1
  - psutil==7.0.0
  - pandas==2.3.0
  - azureml-inference-server-http
name: mlflow-env

Note

The dependencies section of this conda file includes the azureml-inference-server-http package.

Use this conda dependencies file to create the environment:

The environment is created inline in the deployment configuration.

environment = Environment(
    conda_file="sklearn-diabetes/environment/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
)

Create the deployment

In the endpoints/online/ncd folder, create a deployment configuration file, deployment.yml, that contains the following lines:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: sklearn-diabetes-custom
endpoint_name: my-endpoint
model: azureml:sklearn-diabetes@latest
environment: 
    image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04
    conda_file: sklearn-diabetes/environment/conda.yaml
code_configuration:
    code: sklearn-diabetes/src
    scoring_script: score.py
instance_type: Standard_F2s_v2
instance_count: 1

Create the deployment:

az ml online-deployment create -f endpoints/online/ncd/deployment.yml

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(
        code="sklearn-diabetes/src",
        scoring_script="score.py"
    ),
    instance_type="Standard_F4s_v2",
    instance_count=1,
)

ml_client.online_deployments.begin_create_or_update(blue_deployment)

Serve requests

When your deployment is complete, it's ready to serve requests. One way to test the deployment is to use the invoke method with a sample request file such as the following file, sample-request-sklearn.json:

{"input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
      [ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
    ],
    "index": [0,1]
  }}

{
  "input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ]
    ],
    "index": [0]
  }
}

{"input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
      [ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
    ],
    "index": [0,1]
  }}

Submit a request to the endpoint:

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/ncd/sample-request-sklearn.json

response = ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=deployment.name,
    request_file="sample-request-sklearn.json",
)

The response should be similar to the following text:

{
    "predictions": [ 
    1095.2797413413252,
    1134.585328803727
    ]
}

{
    "predictions": [ 
    1095.2797413413252
    ]
}

{
    "predictions": [ 
    1095.2797413413252,
    1134.585328803727
    ]
}

Warning

MLflow 2.0 advisory: In MLflow 1.X, the response doesn't contain the predictions key.

Clean up resources

If you no longer need the endpoint, delete its associated resources:

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

ml_client.online_endpoints.begin_delete(endpoint_name)

deployment_client.delete_endpoint(endpoint_name)

Share via

Configure initial settings

Configure custom settings

Share via

Deploy MLflow models to online endpoints

Prerequisites

About the example

Follow along in Jupyter Notebook

Connect to your workspace

Register the model

What if your model was logged inside a run?

Deploy an MLflow model to an online endpoint

Invoke the endpoint

Customize MLflow model deployments

Deploy a custom scoring script

Identify the model folder

Create a scoring script

Create an environment

Create the deployment

Serve requests

Clean up resources

Related content

Feedback

Additional resources