Tutorial: Build an Azure Machine Learning pipeline for image classification

APPLIES TO: Azure Machine Learning SDK v1 for Python

Important

This article provides information on using the Azure Machine Learning SDK v1. SDK v1 is deprecated as of March 31, 2025. Support for it will end on June 30, 2026. You can install and use SDK v1 until that date. Your existing workflows using SDK v1 will continue to operate after the end-of-support date. However, they could be exposed to security risks or breaking changes in the event of architectural changes in the product.

We recommend that you transition to the SDK v2 before June 30, 2026. For more information on SDK v2, see What is Azure Machine Learning CLI and Python SDK v2? and the SDK v2 reference.

Note

For a tutorial that uses SDK v2 to build a pipeline, see Tutorial: Use ML pipelines for production ML workflows with Python SDK v2 in a Jupyter Notebook.

In this tutorial, you learn how to build an Azure Machine Learning pipeline to prepare data and train a machine learning model. Machine learning pipelines optimize your workflow with speed, portability, and reuse, so you can focus on machine learning instead of infrastructure and automation.

The example trains a small Keras convolutional neural network to classify images in the Fashion MNIST dataset.

In this tutorial, you complete the following tasks:

Configure workspace
Create an Experiment to hold your work
Provision a ComputeTarget to do the work
Create a Dataset in which to store compressed data
Create a pipeline step to prepare the data for training
Define a runtime Environment in which to perform training
Create a pipeline step to define the neural network and perform the training
Compose a Pipeline from the pipeline steps
Run the pipeline in the experiment
Review the output of the steps and the trained neural network
Register the model for further use

If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning today.

Prerequisites

Complete Create resources to get started if you don't already have an Azure Machine Learning workspace.
A Python environment in which you install both the azureml-core and azureml-pipeline packages. Use this environment to define and control your Azure Machine Learning resources. It's separate from the environment used at runtime for training.

Important

The SDK v1 packages (azureml-core and azureml-pipeline) require Python 3.8-3.10. Python 3.10 is recommended as it remains in security support. If you have difficulty installing the packages, make sure that python --version is a compatible release. Consult the documentation of your Python virtual environment manager (venv, conda, and so on) for instructions.

Start an interactive Python session

This tutorial uses the Python SDK for Azure Machine Learning to create and control an Azure Machine Learning pipeline. The tutorial assumes that you run the code snippets interactively in either a Python REPL environment or a Jupyter notebook.

This tutorial is based on the image-classification.ipynb notebook found in the v1/python-sdk/tutorials/using-pipelines directory of the Azure Machine Learning Examples v1-archive branch. The source code for the steps themselves is in the keras-mnist-fashion subdirectory.

Import types

Import all the Azure Machine Learning types that you need for this tutorial:

import os
import azureml.core
from azureml.core import (
    Workspace,
    Experiment,
    Dataset,
    Datastore,
    ComputeTarget,
    Environment,
    ScriptRunConfig
)
from azureml.data import OutputFileDatasetConfig
from azureml.core.compute import AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import Pipeline

# check core SDK version number
print("Azure Machine Learning SDK Version: ", azureml.core.VERSION)

The Azure Machine Learning SDK version should be 1.61 or the latest available version. If it isn't, upgrade by using pip install --upgrade azureml-core.

Configure workspace

Create a workspace object from the existing Azure Machine Learning workspace.

workspace = Workspace.from_config()

Important

This code snippet expects the workspace configuration to be saved in the current directory or its parent. For more information on creating a workspace, see Create workspace resources. For more information on saving the configuration to file, see Create a workspace configuration file.

Create the infrastructure for your pipeline

Create an Experiment object to hold the results of your pipeline runs:

exp = Experiment(workspace=workspace, name="keras-mnist-fashion")

Create a ComputeTarget that represents the machine resource on which your pipeline runs. The simple neural network used in this tutorial trains in just a few minutes even on a CPU-based machine. If you want to use a GPU for training, set use_gpu to True. Provisioning a compute target generally takes about five minutes.

use_gpu = False

# choose a name for your cluster
cluster_name = "gpu-cluster" if use_gpu else "cpu-cluster"

found = False
# Check if this compute target already exists in the workspace.
cts = workspace.compute_targets
if cluster_name in cts and cts[cluster_name].type == "AmlCompute":
    found = True
    print("Found existing compute target.")
    compute_target = cts[cluster_name]
if not found:
    print("Creating a new compute target...")
    compute_config = AmlCompute.provisioning_configuration(
        vm_size= "STANDARD_NC4AS_T4_V3" if use_gpu else "STANDARD_D2_V2"
        # vm_priority = 'lowpriority', # optional
        max_nodes=4,
    )

    # Create the cluster.
    compute_target = ComputeTarget.create(workspace, cluster_name, compute_config)

    # Can poll for a minimum number of nodes and for a specific timeout.
    # If no min_node_count is provided, it will use the scale settings for the cluster.
    compute_target.wait_for_completion(
        show_output=True, min_node_count=None, timeout_in_minutes=10
    )
# For a more detailed view of current AmlCompute status, use get_status().print(compute_target.get_status().serialize())

Note

GPU availability depends on the quota of your Azure subscription and upon Azure capacity. See Manage and increase quotas for resources with Azure Machine Learning.

Create a dataset for the Azure-stored data

Fashion-MNIST is a dataset of fashion images divided into 10 classes. Each image is a 28x28 grayscale image and there are 60,000 training and 10,000 test images. As an image classification problem, Fashion-MNIST is harder than the classic MNIST handwritten digit database. It's distributed in the same compressed binary form as the original handwritten digit database .

To create a Dataset that references the Web-based data, run:

data_urls = ["https://data4mldemo6150520719.blob.core.windows.net/demo/mnist-fashion"]
fashion_ds = Dataset.File.from_files(data_urls)

# list the files referenced by fashion_ds
print(fashion_ds.to_path())

This code completes quickly. The underlying data remains in the Azure storage resource specified in the data_urls array.

Create the data-preparation pipeline step

The first step in this pipeline converts the compressed data files of fashion_ds into a dataset in your own workspace consisting of CSV files ready for use in training. Once registered with the workspace, your collaborators can access this data for their own analysis, training, and so on.

datastore = workspace.get_default_datastore()
prepared_fashion_ds = OutputFileDatasetConfig(
    destination=(datastore, "outputdataset/{run-id}")
).register_on_complete(name="prepared_fashion_ds")

The preceding code specifies a dataset that is based on the output of a pipeline step. The underlying processed files go in the workspace's default datastore's blob storage at the path specified in destination. The dataset is registered in the workspace with the name prepared_fashion_ds.

Create the pipeline step's source

The code that you executed so far creates and controls Azure resources. Now it's time to write code that does the first step in the domain.

If you're following along with the example in the Azure Machine Learning Examples repo, the source file is already available as keras-mnist-fashion/prepare.py.

If you're working from scratch, create a subdirectory called keras-mnist-fashion/. Create a new file, add the following code to it, and name the file prepare.py.

# prepare.py
# Converts MNIST-formatted files at the passed-in input path to a passed-in output path
import os
import sys

# Conversion routine for MNIST binary format
def convert(imgf, labelf, outf, n):
    f = open(imgf, "rb")
    l = open(labelf, "rb")
    o = open(outf, "w")

    f.read(16)
    l.read(8)
    images = []

    for i in range(n):
        image = [ord(l.read(1))]
        for j in range(28 * 28):
            image.append(ord(f.read(1)))
        images.append(image)

    for image in images:
        o.write(",".join(str(pix) for pix in image) + "\n")
    f.close()
    o.close()
    l.close()

# The MNIST-formatted source
mounted_input_path = sys.argv[1]
# The output directory at which the outputs will be written
mounted_output_path = sys.argv[2]

# Create the output directory
os.makedirs(mounted_output_path, exist_ok=True)

# Convert the training data
convert(
    os.path.join(mounted_input_path, "mnist-fashion/train-images-idx3-ubyte"),
    os.path.join(mounted_input_path, "mnist-fashion/train-labels-idx1-ubyte"),
    os.path.join(mounted_output_path, "mnist_train.csv"),
    60000,
)

# Convert the test data
convert(
    os.path.join(mounted_input_path, "mnist-fashion/t10k-images-idx3-ubyte"),
    os.path.join(mounted_input_path, "mnist-fashion/t10k-labels-idx1-ubyte"),
    os.path.join(mounted_output_path, "mnist_test.csv"),
    10000,
)

The code in prepare.py takes two command-line arguments: the first is assigned to mounted_input_path and the second to mounted_output_path. If that subdirectory doesn't exist, the call to os.makedirs creates it. Then, the program converts the training and testing data and outputs the comma-separated files to the mounted_output_path.

Specify the pipeline step

Back in the Python environment you're using to specify the pipeline, run this code to create a PythonScriptStep for your preparation code:

script_folder = "./keras-mnist-fashion"

prep_step = PythonScriptStep(
    name="prepare step",
    script_name="prepare.py",
    # On the compute target, mount fashion_ds dataset as input, prepared_fashion_ds as output
    arguments=[fashion_ds.as_named_input("fashion_ds").as_mount(), prepared_fashion_ds],
    source_directory=script_folder,
    compute_target=compute_target,
    allow_reuse=True,
)

The call to PythonScriptStep specifies that, when the pipeline step runs:

All the files in the script_folder directory are uploaded to the compute_target
Among those uploaded source files, the file prepare.py runs
The fashion_ds and prepared_fashion_ds datasets mount on the compute_target and appear as directories
The path to the fashion_ds files is the first argument to prepare.py. In prepare.py, this argument is assigned to mounted_input_path
The path to the prepared_fashion_ds is the second argument to prepare.py. In prepare.py, this argument is assigned to mounted_output_path
Because allow_reuse is True, it doesn't rerun until its source files or inputs change
This PythonScriptStep is named prepare step

Modularity and reuse are key benefits of pipelines. Azure Machine Learning can automatically determine source code or Dataset changes. If allow_reuse is True, the pipeline reuses the output of a step that isn't affected without rerunning the steps again. If a step relies on a data source external to Azure Machine Learning that might change (for instance, a URL that contains sales data), set allow_reuse to False and the pipeline step runs every time the pipeline runs.

Create the training step

After converting the data from the compressed format to CSV files, you can use it to train a convolutional neural network.

Create the training step's source

For larger pipelines, put each step's source code in a separate directory, such as src/prepare/ or src/train/. For this tutorial, use or create the train.py file in the keras-mnist-fashion/ source directory.

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.utils import to_categorical
from keras.callbacks import Callback

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from azureml.core import Run

# dataset object from the run
run = Run.get_context()
dataset = run.input_datasets["prepared_fashion_ds"]

# split dataset into train and test set
(train_dataset, test_dataset) = dataset.random_split(percentage=0.8, seed=111)

# load dataset into pandas dataframe
data_train = train_dataset.to_pandas_dataframe()
data_test = test_dataset.to_pandas_dataframe()

img_rows, img_cols = 28, 28
input_shape = (img_rows, img_cols, 1)

X = np.array(data_train.iloc[:, 1:])
y = to_categorical(np.array(data_train.iloc[:, 0]))

# here we split validation data to optimiza classifier during training
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=13)

# test data
X_test = np.array(data_test.iloc[:, 1:])
y_test = to_categorical(np.array(data_test.iloc[:, 0]))


X_train = (
    X_train.reshape(X_train.shape[0], img_rows, img_cols, 1).astype("float32") / 255
)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1).astype("float32") / 255
X_val = X_val.reshape(X_val.shape[0], img_rows, img_cols, 1).astype("float32") / 255

batch_size = 256
num_classes = 10
epochs = 10

# construct neuron network
model = Sequential()
model.add(
    Conv2D(
        32,
        kernel_size=(3, 3),
        activation="relu",
        kernel_initializer="he_normal",
        input_shape=input_shape,
    )
)
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), activation="relu"))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(num_classes, activation="softmax"))

model.compile(
    loss=keras.losses.categorical_crossentropy,
    optimizer=keras.optimizers.Adam(),
    metrics=["accuracy"],
)

# start an Azure ML run
run = Run.get_context()


class LogRunMetrics(Callback):
    # callback at the end of every epoch
    def on_epoch_end(self, epoch, log):
        # log a value repeated which creates a list
        run.log("Loss", log["loss"])
        run.log("Accuracy", log["accuracy"])


history = model.fit(
    X_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(X_val, y_val),
    callbacks=[LogRunMetrics()],
)

score = model.evaluate(X_test, y_test, verbose=0)

# log a single value
run.log("Final test loss", score[0])
print("Test loss:", score[0])

run.log("Final test accuracy", score[1])
print("Test accuracy:", score[1])

plt.figure(figsize=(6, 3))
plt.title("Fashion MNIST with Keras ({} epochs)".format(epochs), fontsize=14)
plt.plot(history.history["accuracy"], "b-", label="Accuracy", lw=4, alpha=0.5)
plt.plot(history.history["loss"], "r--", label="Loss", lw=4, alpha=0.5)
plt.legend(fontsize=12)
plt.grid(True)

# log an image
run.log_image("Loss v.s. Accuracy", plot=plt)

# create a ./outputs/model folder in the compute target
# files saved in the "./outputs" folder are automatically uploaded into run history
os.makedirs("./outputs/model", exist_ok=True)

# serialize NN architecture to JSON
model_json = model.to_json()
# save model JSON
with open("./outputs/model/model.json", "w") as f:
    f.write(model_json)
# save model weights
model.save_weights("./outputs/model/model.h5")
print("model saved in ./outputs/model folder")

Most of this code should be familiar to ML developers:

The data is partitioned into train and validation sets for training, and a separate test subset for final scoring.
The input shape is 28x28x1 (only 1 because the input is grayscale), there are 256 inputs in a batch, and there are 10 classes.
The number of training epochs is 10.
The model has three convolutional layers, with max pooling and dropout, followed by a dense layer and softmax head.
The model is fitted for 10 epochs and then evaluated.
The model architecture is written to outputs/model/model.json and the weights to outputs/model/model.h5.

Some of the code, though, is specific to Azure Machine Learning. run = Run.get_context() retrieves a Run object, which contains the current service context. The train.py source uses this run object to retrieve the input dataset by its name. This approach is an alternative to the code in prepare.py that retrieves the dataset via the argv array of script arguments.

The run object also logs the training progress at the end of every epoch and, at the end of training, logs the graph of loss and accuracy over time.

Create the training pipeline step

The training step has a slightly more complex configuration than the preparation step. The preparation step uses only standard Python libraries. More commonly, you need to modify the runtime environment in which your source code runs.

Create a file named conda_dependencies.yml with the following contents:

dependencies:
- python=3.10
- pip:
  - azureml-core
  - azureml-dataset-runtime
  - keras==2.4.3
  - tensorflow>=2.15
  - numpy
  - scikit-learn
  - pandas
  - matplotlib

The Environment class represents the runtime environment in which a machine learning task runs. Associate the preceding specification with the training code by using:

keras_env = Environment.from_conda_specification(
    name="keras-env", file_path="./conda_dependencies.yml"
)

train_cfg = ScriptRunConfig(
    source_directory=script_folder,
    script="train.py",
    compute_target=compute_target,
    environment=keras_env,
)

The code to create the training step is similar to the code that creates the preparation step:

train_step = PythonScriptStep(
    name="train step",
    arguments=[
        prepared_fashion_ds.read_delimited_files().as_input(name="prepared_fashion_ds")
    ],
    source_directory=train_cfg.source_directory,
    script_name=train_cfg.script,
    runconfig=train_cfg.run_config,
)

Create and run the pipeline

After you specify data inputs and outputs and create your pipeline's steps, compose them into a pipeline and run it:

pipeline = Pipeline(workspace, steps=[prep_step, train_step])
run = exp.submit(pipeline)

The Pipeline object you create runs in your workspace and is composed of the preparation and training steps you specify.

Note

This pipeline has a simple dependency graph: the training step relies on the preparation step and the preparation step relies on the fashion_ds dataset. Production pipelines often have much more complex dependencies. Steps can rely on multiple upstream steps. A source code change in an early step can have far-reaching consequences. Azure Machine Learning tracks these concerns for you. You need only pass in the array of steps. Azure Machine Learning takes care of calculating the execution graph.

The call to submit the Experiment completes quickly, and produces output similar to:

Submitted PipelineRun 5968530a-abcd-1234-9cc1-46168951b5eb
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/abc-xyz...

You can monitor the pipeline run by opening the link or you can block until it completes by running:

run.wait_for_completion(show_output=True)

Important

The first pipeline run takes roughly 15 minutes. All dependencies must be downloaded, a Docker image must be created, and the Python environment must be provisioned and created. Running the pipeline again takes significantly less time because the pipeline reuses those resources instead of creating them. However, total run time for the pipeline depends on the workload of your scripts and the processes that run in each pipeline step.

Once the pipeline completes, you can retrieve the metrics you logged in the training step:

run.find_step_run("train step")[0].get_metrics()

If you're satisfied with the metrics, register the model in your workspace:

run.find_step_run("train step")[0].register_model(
    model_name="keras-model",
    model_path="outputs/model/",
    datasets=[("train test data", fashion_ds)],
)

Clean up resources

Don't complete this section if you plan to run other Azure Machine Learning tutorials.

Stop the compute instance

If you used a compute instance, stop the VM when you aren't using it to reduce cost.

In your workspace, select Compute.
From the list, select the name of the compute instance.
Select Stop.
When you're ready to use the server again, select Start.

Delete everything

If you don't plan to use the resources you created, delete them so you don't incur any charges:

In the Azure portal, in the left menu, select Resource groups.
In the list of resource groups, select the resource group you created.
Select Delete resource group.
Enter the resource group name. Then, select Delete.

You can also keep the resource group but delete a single workspace. Display the workspace properties, and then select Delete.

Next steps

In this tutorial, you used the following types:

The Workspace type represents your Azure Machine Learning workspace. It contains:
- The Experiment that holds the results of training runs of your pipeline.
- The Dataset that lazily loads the data held in the Fashion-MNIST datastore.
- The ComputeTarget that represents the machines on which the pipeline steps run.
- The Environment that is the runtime environment in which the pipeline steps run.
- The Pipeline that composes the PythonScriptStep steps into a whole.
- The Model that you register after you're satisfied with the training process.

The Workspace object contains references to other resources, such as notebooks and endpoints, that you didn't use in this tutorial. For more information, see What is an Azure Machine Learning workspace?

The OutputFileDatasetConfig promotes the output of a run to a file-based dataset. For more information on datasets and working with data, see How to access data.

For more information on compute targets and environments, see What are compute targets in Azure Machine Learning? and What are Azure Machine Learning environments?

The ScriptRunConfig associates a ComputeTarget and Environment with Python source files. A PythonScriptStep takes that ScriptRunConfig and defines its inputs and outputs. In this pipeline, the output was the file dataset built by the OutputFileDatasetConfig.

For more examples of how to build pipelines by using the machine learning SDK, see the example repository.

Feedback

Was this page helpful?

Last updated on 2026-02-05

Share via

Tutorial: Build an Azure Machine Learning pipeline for image classification

Prerequisites

Start an interactive Python session

Import types

Configure workspace

Create the infrastructure for your pipeline

Create a dataset for the Azure-stored data

Create the data-preparation pipeline step

Create the pipeline step's source

Specify the pipeline step

Create the training step

Create the training step's source

Create the training pipeline step

Create and run the pipeline

Clean up resources

Stop the compute instance

Delete everything

Next steps

Feedback

Additional resources