Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
APPLIES TO:
Azure Machine Learning SDK v1 for Python
Important
This article provides information on using the Azure Machine Learning SDK v1. SDK v1 is deprecated as of March 31, 2025. Support for it will end on June 30, 2026. You can install and use SDK v1 until that date. Your existing workflows using SDK v1 will continue to operate after the end-of-support date. However, they could be exposed to security risks or breaking changes in the event of architectural changes in the product.
We recommend that you transition to the SDK v2 before June 30, 2026. For more information on SDK v2, see What is Azure Machine Learning CLI and Python SDK v2? and the SDK v2 reference.
Note
For a tutorial that uses SDK v2 to build a pipeline, see Tutorial: Use ML pipelines for production ML workflows with Python SDK v2 in a Jupyter Notebook.
In this tutorial, you learn how to build an Azure Machine Learning pipeline to prepare data and train a machine learning model. Machine learning pipelines optimize your workflow with speed, portability, and reuse, so you can focus on machine learning instead of infrastructure and automation.
The example trains a small Keras convolutional neural network to classify images in the Fashion MNIST dataset.
In this tutorial, you complete the following tasks:
- Configure workspace
- Create an Experiment to hold your work
- Provision a ComputeTarget to do the work
- Create a Dataset in which to store compressed data
- Create a pipeline step to prepare the data for training
- Define a runtime Environment in which to perform training
- Create a pipeline step to define the neural network and perform the training
- Compose a Pipeline from the pipeline steps
- Run the pipeline in the experiment
- Review the output of the steps and the trained neural network
- Register the model for further use
If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning today.
Prerequisites
- Complete Create resources to get started if you don't already have an Azure Machine Learning workspace.
- A Python environment in which you install both the
azureml-coreandazureml-pipelinepackages. Use this environment to define and control your Azure Machine Learning resources. It's separate from the environment used at runtime for training.
Important
The SDK v1 packages (azureml-core and azureml-pipeline) require Python 3.8-3.10. Python 3.10 is recommended as it remains in security support. If you have difficulty installing the packages, make sure that python --version is a compatible release. Consult the documentation of your Python virtual environment manager (venv, conda, and so on) for instructions.
Start an interactive Python session
This tutorial uses the Python SDK for Azure Machine Learning to create and control an Azure Machine Learning pipeline. The tutorial assumes that you run the code snippets interactively in either a Python REPL environment or a Jupyter notebook.
- This tutorial is based on the
image-classification.ipynbnotebook found in thev1/python-sdk/tutorials/using-pipelinesdirectory of the Azure Machine Learning Examples v1-archive branch. The source code for the steps themselves is in thekeras-mnist-fashionsubdirectory.
Import types
Import all the Azure Machine Learning types that you need for this tutorial:
import os
import azureml.core
from azureml.core import (
Workspace,
Experiment,
Dataset,
Datastore,
ComputeTarget,
Environment,
ScriptRunConfig
)
from azureml.data import OutputFileDatasetConfig
from azureml.core.compute import AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import Pipeline
# check core SDK version number
print("Azure Machine Learning SDK Version: ", azureml.core.VERSION)
The Azure Machine Learning SDK version should be 1.61 or the latest available version. If it isn't, upgrade by using pip install --upgrade azureml-core.
Configure workspace
Create a workspace object from the existing Azure Machine Learning workspace.
workspace = Workspace.from_config()
Important
This code snippet expects the workspace configuration to be saved in the current directory or its parent. For more information on creating a workspace, see Create workspace resources. For more information on saving the configuration to file, see Create a workspace configuration file.
Create the infrastructure for your pipeline
Create an Experiment object to hold the results of your pipeline runs:
exp = Experiment(workspace=workspace, name="keras-mnist-fashion")
Create a ComputeTarget that represents the machine resource on which your pipeline runs. The simple neural network used in this tutorial trains in just a few minutes even on a CPU-based machine. If you want to use a GPU for training, set use_gpu to True. Provisioning a compute target generally takes about five minutes.
use_gpu = False
# choose a name for your cluster
cluster_name = "gpu-cluster" if use_gpu else "cpu-cluster"
found = False
# Check if this compute target already exists in the workspace.
cts = workspace.compute_targets
if cluster_name in cts and cts[cluster_name].type == "AmlCompute":
found = True
print("Found existing compute target.")
compute_target = cts[cluster_name]
if not found:
print("Creating a new compute target...")
compute_config = AmlCompute.provisioning_configuration(
vm_size= "STANDARD_NC4AS_T4_V3" if use_gpu else "STANDARD_D2_V2"
# vm_priority = 'lowpriority', # optional
max_nodes=4,
)
# Create the cluster.
compute_target = ComputeTarget.create(workspace, cluster_name, compute_config)
# Can poll for a minimum number of nodes and for a specific timeout.
# If no min_node_count is provided, it will use the scale settings for the cluster.
compute_target.wait_for_completion(
show_output=True, min_node_count=None, timeout_in_minutes=10
)
# For a more detailed view of current AmlCompute status, use get_status().print(compute_target.get_status().serialize())
Note
GPU availability depends on the quota of your Azure subscription and upon Azure capacity. See Manage and increase quotas for resources with Azure Machine Learning.
Create a dataset for the Azure-stored data
Fashion-MNIST is a dataset of fashion images divided into 10 classes. Each image is a 28x28 grayscale image and there are 60,000 training and 10,000 test images. As an image classification problem, Fashion-MNIST is harder than the classic MNIST handwritten digit database. It's distributed in the same compressed binary form as the original handwritten digit database .
To create a Dataset that references the Web-based data, run:
data_urls = ["https://data4mldemo6150520719.blob.core.windows.net/demo/mnist-fashion"]
fashion_ds = Dataset.File.from_files(data_urls)
# list the files referenced by fashion_ds
print(fashion_ds.to_path())
This code completes quickly. The underlying data remains in the Azure storage resource specified in the data_urls array.
Create the data-preparation pipeline step
The first step in this pipeline converts the compressed data files of fashion_ds into a dataset in your own workspace consisting of CSV files ready for use in training. Once registered with the workspace, your collaborators can access this data for their own analysis, training, and so on.
datastore = workspace.get_default_datastore()
prepared_fashion_ds = OutputFileDatasetConfig(
destination=(datastore, "outputdataset/{run-id}")
).register_on_complete(name="prepared_fashion_ds")
The preceding code specifies a dataset that is based on the output of a pipeline step. The underlying processed files go in the workspace's default datastore's blob storage at the path specified in destination. The dataset is registered in the workspace with the name prepared_fashion_ds.
Create the pipeline step's source
The code that you executed so far creates and controls Azure resources. Now it's time to write code that does the first step in the domain.
If you're following along with the example in the Azure Machine Learning Examples repo, the source file is already available as keras-mnist-fashion/prepare.py.
If you're working from scratch, create a subdirectory called keras-mnist-fashion/. Create a new file, add the following code to it, and name the file prepare.py.
# prepare.py
# Converts MNIST-formatted files at the passed-in input path to a passed-in output path
import os
import sys
# Conversion routine for MNIST binary format
def convert(imgf, labelf, outf, n):
f = open(imgf, "rb")
l = open(labelf, "rb")
o = open(outf, "w")
f.read(16)
l.read(8)
images = []
for i in range(n):
image = [ord(l.read(1))]
for j in range(28 * 28):
image.append(ord(f.read(1)))
images.append(image)
for image in images:
o.write(",".join(str(pix) for pix in image) + "\n")
f.close()
o.close()
l.close()
# The MNIST-formatted source
mounted_input_path = sys.argv[1]
# The output directory at which the outputs will be written
mounted_output_path = sys.argv[2]
# Create the output directory
os.makedirs(mounted_output_path, exist_ok=True)
# Convert the training data
convert(
os.path.join(mounted_input_path, "mnist-fashion/train-images-idx3-ubyte"),
os.path.join(mounted_input_path, "mnist-fashion/train-labels-idx1-ubyte"),
os.path.join(mounted_output_path, "mnist_train.csv"),
60000,
)
# Convert the test data
convert(
os.path.join(mounted_input_path, "mnist-fashion/t10k-images-idx3-ubyte"),
os.path.join(mounted_input_path, "mnist-fashion/t10k-labels-idx1-ubyte"),
os.path.join(mounted_output_path, "mnist_test.csv"),
10000,
)
The code in prepare.py takes two command-line arguments: the first is assigned to mounted_input_path and the second to mounted_output_path. If that subdirectory doesn't exist, the call to os.makedirs creates it. Then, the program converts the training and testing data and outputs the comma-separated files to the mounted_output_path.
Specify the pipeline step
Back in the Python environment you're using to specify the pipeline, run this code to create a PythonScriptStep for your preparation code:
script_folder = "./keras-mnist-fashion"
prep_step = PythonScriptStep(
name="prepare step",
script_name="prepare.py",
# On the compute target, mount fashion_ds dataset as input, prepared_fashion_ds as output
arguments=[fashion_ds.as_named_input("fashion_ds").as_mount(), prepared_fashion_ds],
source_directory=script_folder,
compute_target=compute_target,
allow_reuse=True,
)
The call to PythonScriptStep specifies that, when the pipeline step runs:
- All the files in the
script_folderdirectory are uploaded to thecompute_target - Among those uploaded source files, the file
prepare.pyruns - The
fashion_dsandprepared_fashion_dsdatasets mount on thecompute_targetand appear as directories - The path to the
fashion_dsfiles is the first argument toprepare.py. Inprepare.py, this argument is assigned tomounted_input_path - The path to the
prepared_fashion_dsis the second argument toprepare.py. Inprepare.py, this argument is assigned tomounted_output_path - Because
allow_reuseisTrue, it doesn't rerun until its source files or inputs change - This
PythonScriptStepis namedprepare step
Modularity and reuse are key benefits of pipelines. Azure Machine Learning can automatically determine source code or Dataset changes. If allow_reuse is True, the pipeline reuses the output of a step that isn't affected without rerunning the steps again. If a step relies on a data source external to Azure Machine Learning that might change (for instance, a URL that contains sales data), set allow_reuse to False and the pipeline step runs every time the pipeline runs.
Create the training step
After converting the data from the compressed format to CSV files, you can use it to train a convolutional neural network.
Create the training step's source
For larger pipelines, put each step's source code in a separate directory, such as src/prepare/ or src/train/. For this tutorial, use or create the train.py file in the keras-mnist-fashion/ source directory.
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.utils import to_categorical
from keras.callbacks import Callback
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from azureml.core import Run
# dataset object from the run
run = Run.get_context()
dataset = run.input_datasets["prepared_fashion_ds"]
# split dataset into train and test set
(train_dataset, test_dataset) = dataset.random_split(percentage=0.8, seed=111)
# load dataset into pandas dataframe
data_train = train_dataset.to_pandas_dataframe()
data_test = test_dataset.to_pandas_dataframe()
img_rows, img_cols = 28, 28
input_shape = (img_rows, img_cols, 1)
X = np.array(data_train.iloc[:, 1:])
y = to_categorical(np.array(data_train.iloc[:, 0]))
# here we split validation data to optimiza classifier during training
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=13)
# test data
X_test = np.array(data_test.iloc[:, 1:])
y_test = to_categorical(np.array(data_test.iloc[:, 0]))
X_train = (
X_train.reshape(X_train.shape[0], img_rows, img_cols, 1).astype("float32") / 255
)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1).astype("float32") / 255
X_val = X_val.reshape(X_val.shape[0], img_rows, img_cols, 1).astype("float32") / 255
batch_size = 256
num_classes = 10
epochs = 10
# construct neuron network
model = Sequential()
model.add(
Conv2D(
32,
kernel_size=(3, 3),
activation="relu",
kernel_initializer="he_normal",
input_shape=input_shape,
)
)
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), activation="relu"))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(num_classes, activation="softmax"))
model.compile(
loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(),
metrics=["accuracy"],
)
# start an Azure ML run
run = Run.get_context()
class LogRunMetrics(Callback):
# callback at the end of every epoch
def on_epoch_end(self, epoch, log):
# log a value repeated which creates a list
run.log("Loss", log["loss"])
run.log("Accuracy", log["accuracy"])
history = model.fit(
X_train,
y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(X_val, y_val),
callbacks=[LogRunMetrics()],
)
score = model.evaluate(X_test, y_test, verbose=0)
# log a single value
run.log("Final test loss", score[0])
print("Test loss:", score[0])
run.log("Final test accuracy", score[1])
print("Test accuracy:", score[1])
plt.figure(figsize=(6, 3))
plt.title("Fashion MNIST with Keras ({} epochs)".format(epochs), fontsize=14)
plt.plot(history.history["accuracy"], "b-", label="Accuracy", lw=4, alpha=0.5)
plt.plot(history.history["loss"], "r--", label="Loss", lw=4, alpha=0.5)
plt.legend(fontsize=12)
plt.grid(True)
# log an image
run.log_image("Loss v.s. Accuracy", plot=plt)
# create a ./outputs/model folder in the compute target
# files saved in the "./outputs" folder are automatically uploaded into run history
os.makedirs("./outputs/model", exist_ok=True)
# serialize NN architecture to JSON
model_json = model.to_json()
# save model JSON
with open("./outputs/model/model.json", "w") as f:
f.write(model_json)
# save model weights
model.save_weights("./outputs/model/model.h5")
print("model saved in ./outputs/model folder")
Most of this code should be familiar to ML developers:
- The data is partitioned into train and validation sets for training, and a separate test subset for final scoring.
- The input shape is 28x28x1 (only 1 because the input is grayscale), there are 256 inputs in a batch, and there are 10 classes.
- The number of training epochs is 10.
- The model has three convolutional layers, with max pooling and dropout, followed by a dense layer and softmax head.
- The model is fitted for 10 epochs and then evaluated.
- The model architecture is written to
outputs/model/model.jsonand the weights tooutputs/model/model.h5.
Some of the code, though, is specific to Azure Machine Learning. run = Run.get_context() retrieves a Run object, which contains the current service context. The train.py source uses this run object to retrieve the input dataset by its name. This approach is an alternative to the code in prepare.py that retrieves the dataset via the argv array of script arguments.
The run object also logs the training progress at the end of every epoch and, at the end of training, logs the graph of loss and accuracy over time.
Create the training pipeline step
The training step has a slightly more complex configuration than the preparation step. The preparation step uses only standard Python libraries. More commonly, you need to modify the runtime environment in which your source code runs.
Create a file named conda_dependencies.yml with the following contents:
dependencies:
- python=3.10
- pip:
- azureml-core
- azureml-dataset-runtime
- keras==2.4.3
- tensorflow>=2.15
- numpy
- scikit-learn
- pandas
- matplotlib
The Environment class represents the runtime environment in which a machine learning task runs. Associate the preceding specification with the training code by using:
keras_env = Environment.from_conda_specification(
name="keras-env", file_path="./conda_dependencies.yml"
)
train_cfg = ScriptRunConfig(
source_directory=script_folder,
script="train.py",
compute_target=compute_target,
environment=keras_env,
)
The code to create the training step is similar to the code that creates the preparation step:
train_step = PythonScriptStep(
name="train step",
arguments=[
prepared_fashion_ds.read_delimited_files().as_input(name="prepared_fashion_ds")
],
source_directory=train_cfg.source_directory,
script_name=train_cfg.script,
runconfig=train_cfg.run_config,
)
Create and run the pipeline
After you specify data inputs and outputs and create your pipeline's steps, compose them into a pipeline and run it:
pipeline = Pipeline(workspace, steps=[prep_step, train_step])
run = exp.submit(pipeline)
The Pipeline object you create runs in your workspace and is composed of the preparation and training steps you specify.
Note
This pipeline has a simple dependency graph: the training step relies on the preparation step and the preparation step relies on the fashion_ds dataset. Production pipelines often have much more complex dependencies. Steps can rely on multiple upstream steps. A source code change in an early step can have far-reaching consequences. Azure Machine Learning tracks these concerns for you. You need only pass in the array of steps. Azure Machine Learning takes care of calculating the execution graph.
The call to submit the Experiment completes quickly, and produces output similar to:
Submitted PipelineRun 5968530a-abcd-1234-9cc1-46168951b5eb
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/abc-xyz...
You can monitor the pipeline run by opening the link or you can block until it completes by running:
run.wait_for_completion(show_output=True)
Important
The first pipeline run takes roughly 15 minutes. All dependencies must be downloaded, a Docker image must be created, and the Python environment must be provisioned and created. Running the pipeline again takes significantly less time because the pipeline reuses those resources instead of creating them. However, total run time for the pipeline depends on the workload of your scripts and the processes that run in each pipeline step.
Once the pipeline completes, you can retrieve the metrics you logged in the training step:
run.find_step_run("train step")[0].get_metrics()
If you're satisfied with the metrics, register the model in your workspace:
run.find_step_run("train step")[0].register_model(
model_name="keras-model",
model_path="outputs/model/",
datasets=[("train test data", fashion_ds)],
)
Clean up resources
Don't complete this section if you plan to run other Azure Machine Learning tutorials.
Stop the compute instance
If you used a compute instance, stop the VM when you aren't using it to reduce cost.
In your workspace, select Compute.
From the list, select the name of the compute instance.
Select Stop.
When you're ready to use the server again, select Start.
Delete everything
If you don't plan to use the resources you created, delete them so you don't incur any charges:
- In the Azure portal, in the left menu, select Resource groups.
- In the list of resource groups, select the resource group you created.
- Select Delete resource group.
- Enter the resource group name. Then, select Delete.
You can also keep the resource group but delete a single workspace. Display the workspace properties, and then select Delete.
Next steps
In this tutorial, you used the following types:
- The
Workspacetype represents your Azure Machine Learning workspace. It contains:- The
Experimentthat holds the results of training runs of your pipeline. - The
Datasetthat lazily loads the data held in the Fashion-MNIST datastore. - The
ComputeTargetthat represents the machines on which the pipeline steps run. - The
Environmentthat is the runtime environment in which the pipeline steps run. - The
Pipelinethat composes thePythonScriptStepsteps into a whole. - The
Modelthat you register after you're satisfied with the training process.
- The
The Workspace object contains references to other resources, such as notebooks and endpoints, that you didn't use in this tutorial. For more information, see What is an Azure Machine Learning workspace?
The OutputFileDatasetConfig promotes the output of a run to a file-based dataset. For more information on datasets and working with data, see How to access data.
For more information on compute targets and environments, see What are compute targets in Azure Machine Learning? and What are Azure Machine Learning environments?
The ScriptRunConfig associates a ComputeTarget and Environment with Python source files. A PythonScriptStep takes that ScriptRunConfig and defines its inputs and outputs. In this pipeline, the output was the file dataset built by the OutputFileDatasetConfig.
For more examples of how to build pipelines by using the machine learning SDK, see the example repository.