HyperDriveStep Class
Creates an Azure ML Pipeline step to run hyperparameter tunning for Machine Learning model training.
For an example of using HyperDriveStep, see the notebook https://aka.ms/pl-hyperdrive.
Create an Azure ML Pipeline step to run hyperparameter tunning for Machine Learning model training.
- Inheritance
-
HyperDriveStep
Constructor
HyperDriveStep(name, hyperdrive_config, estimator_entry_script_arguments=None, inputs=None, outputs=None, metrics_output=None, allow_reuse=True, version=None)
Parameters
Name | Description |
---|---|
name
Required
|
[Required] The name of the step. |
hyperdrive_config
Required
|
[Required] A HyperDriveConfig that defines the configuration for the HyperDrive run. |
estimator_entry_script_arguments
|
A list of command-line arguments for the estimator entry script. If the Estimator's entry script does not accept commandline arguments, set this parameter value to an empty list. Default value: None
|
inputs
|
list[Union[InputPortBinding, PipelineOutputAbstractDataset, DataReference, PortDataReference, PipelineData, DatasetConsumptionConfig]]
A list of input port bindings. Default value: None
|
outputs
|
A list of output port bindings Default value: None
|
metrics_output
|
Optional value specifying the location to store HyperDrive run metrics as a JSON file. Default value: None
|
allow_reuse
|
Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed. Default value: True
|
version
|
An optional version tag to denote a change in functionality for the module. Default value: None
|
name
Required
|
[Required] The name of the step. |
hyperdrive_config
Required
|
[Required] A HyperDriveConfig that defines the configuration for the HyperDrive run. |
estimator_entry_script_arguments
Required
|
A list of command-line arguments for the estimator entry script. If the Estimator's entry script does not accept commandline arguments, set this parameter value to an empty list. |
inputs
Required
|
list[Union[InputPortBinding, PipelineOutputAbstractDataset, DataReference, PortDataReference, PipelineData, DatasetConsumptionConfig]]
A list of input port bindings. |
outputs
Required
|
A list of output port bindings. |
metrics_output
Required
|
An optional value specifying the location to store HyperDrive run metrics as a JSON file. |
allow_reuse
Required
|
Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed. |
version
Required
|
version |
Remarks
Note that the arguments to the entry script used in the estimator object (e.g.,
the TensorFlow object)
must be specified as list using the estimator_entry_script_arguments
parameter when instantiating
an HyperDriveStep. The estimator parameter script_params
accepts a dictionary. However,
estimator_entry_script_argument
parameter expects arguments as a list.
The HyperDriveStep initialization involves specifying a list of
DataReference objects with the inputs
parameter. In Azure ML
Pipelines, a pipeline step can take another step's output or DataReference objects as input. Therefore,
when creating an HyperDriveStep, the inputs
and outputs
parameters must be set explicitly, which
overrides inputs
parameter specified in the Estimator object.
The best practice for working with HyperDriveStep is to use a separate folder for scripts and any dependent
files associated with the step, and specify that folder as the estimator
object's source_directory
. For example, see the source_directory
parameter of the
TensorFlow class. Doing so has two benefits. First, it helps reduce the size
of the snapshot created for the step because only what is needed for the step is snapshotted. Second,
the step's output from a previous run can be reused if there are no changes to the source_directory
that would trigger a re-upload of the snaphot.
The following example shows how to use HyperDriveStep in an Azure Machine Learning Pipeline.
metrics_output_name = 'metrics_output'
metrics_data = PipelineData(name='metrics_data',
datastore=datastore,
pipeline_output_name=metrics_output_name,
training_output=TrainingOutput("Metrics"))
model_output_name = 'model_output'
saved_model = PipelineData(name='saved_model',
datastore=datastore,
pipeline_output_name=model_output_name,
training_output=TrainingOutput("Model",
model_file="outputs/model/saved_model.pb"))
hd_step_name='hd_step01'
hd_step = HyperDriveStep(
name=hd_step_name,
hyperdrive_config=hd_config,
inputs=[data_folder],
outputs=[metrics_data, saved_model])
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb
Methods
create_node |
Create a node from the HyperDrive step and add to the given graph. This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow. |
create_node
Create a node from the HyperDrive step and add to the given graph.
This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.
create_node(graph, default_datastore, context)
Parameters
Name | Description |
---|---|
graph
Required
|
The graph object to add the node to. |
default_datastore
Required
|
The default datastore. |
context
Required
|
<xref:azureml.pipeline.core._GraphContext>
The graph context. |
Returns
Type | Description |
---|---|
The created node. |