FeaturizationConfig Class
Defines feature engineering configuration for automated machine learning experiments in Azure Machine Learning.
Use the FeaturizationConfig class in the featurization
parameter of the
AutoMLConfig class. For more information,
see Configure automated ML experiments.
Create a FeaturizationConfig.
- Inheritance
-
builtins.objectFeaturizationConfig
Constructor
FeaturizationConfig(blocked_transformers: List[str] | None = None, column_purposes: Dict[str, str] | None = None, transformer_params: Dict[str, List[Tuple[List[str], Dict[str, Any]]]] | None = None, drop_columns: List[str] | None = None, dataset_language: str | None = None, prediction_transform_type: str | None = None)
Parameters
Name | Description |
---|---|
blocked_transformers
|
A list of transformer names to be blocked during featurization. Default value: None
|
column_purposes
|
A dictionary of column names and feature types used to update column purpose. Default value: None
|
transformer_params
|
A dictionary of transformer and corresponding customization parameters. Default value: None
|
drop_columns
|
A list of columns to be ignored in the featurization process. This setting is being deprecated. Please drop columns from your datasets as part of your data preparation process before providing the datasets to AutoML. Default value: None
|
prediction_transform_type
|
A str of target transform type to be used to cast target column type. Default value: None
|
blocked_transformers
Required
|
A list of transformer names to be blocked during featurization. |
column_purposes
Required
|
A dictionary of column names and feature types used to update column purpose. |
transformer_params
Required
|
A dictionary of transformer and corresponding customization parameters. |
drop_columns
Required
|
A list of columns to be ignored in the featurization process. This setting is being deprecated. Please drop columns from your datasets as part of your data preparation process before providing the datasets to AutoML. |
dataset_language
|
Three character ISO 639-3 code for the language(s) contained in the dataset. Languages other than English are only supported if you use GPU-enabled compute. The langugage_code 'mul' should be used if the dataset contains multiple languages. To find ISO 639-3 codes for different languages, please refer to https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes. Default value: None
|
prediction_transform_type
Required
|
A str of target transform type to be used to cast target column type. |
Remarks
Featurization customization has methods that allow you to:
Add or remove column purpose. With the
add_column_purpose
andremove_column_purpose
methods you can override the feature type for specified columns, for example, when the feature type of column does not correctly reflect its purpose. The add method supports adding all the feature types given in the FULL_SET attribute of the FeatureType class.Add or remove transformer parameters. With the
add_transformer_params
andremove_transformer_params
methods you can change the parameters of customizable transformers like Imputer, HashOneHotEncoder, and TfIdf. Customizable transformers are listed in the SupportedTransformers class CUSTOMIZABLE_TRANSFORMERS attribute. Use theget_transformer_params
to lookup customization parameters.Block transformers. Block transformers to be used for the featurization process with the
add_blocked_transformers
method. The transformers must be one of the transformers listed in the SupportedTransformers class BLOCKED_TRANSFORMERS attribute.Add a drop column to ignore for featurization and training with the
add_drop_columns
method. For example, you can drop a column that doesn't contain useful information.Add or remove prediction transform type. With
add_prediction_transform_type
and
remove_prediction_transform_type
methods you can override the existing target column type.
Prediction transform types are listed in the PredictionTransformTypes
attribute.
The following code example shows how to customize featurization in automated ML for forecasting. In the example code, dropping a column and adding transform parameters are shown.
featurization_config = FeaturizationConfig()
# Force the CPWVOL5 feature to be numeric type.
featurization_config.add_column_purpose("CPWVOL5", "Numeric")
# Fill missing values in the target column, Quantity, with zeros.
featurization_config.add_transformer_params(
"Imputer", ["Quantity"], {"strategy": "constant", "fill_value": 0}
)
# Fill missing values in the INCOME column with median value.
featurization_config.add_transformer_params(
"Imputer", ["INCOME"], {"strategy": "median"}
)
# Fill missing values in the Price column with forward fill (last value carried forward).
featurization_config.add_transformer_params("Imputer", ["Price"], {"strategy": "ffill"})
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb
The next example shows customizing featurization in a regression problem using the Hardware Performance Dataset. In the example code, a blocked transformer is defined, column purposes are added, and transformer parameters are added.
featurization_config = FeaturizationConfig()
featurization_config.blocked_transformers = ["LabelEncoder"]
# featurization_config.drop_columns = ['MMIN']
featurization_config.add_column_purpose("MYCT", "Numeric")
featurization_config.add_column_purpose("VendorName", "CategoricalHash")
# default strategy mean, add transformer param for for 3 columns
featurization_config.add_transformer_params("Imputer", ["CACH"], {"strategy": "median"})
featurization_config.add_transformer_params(
"Imputer", ["CHMIN"], {"strategy": "median"}
)
featurization_config.add_transformer_params(
"Imputer", ["PRP"], {"strategy": "most_frequent"}
)
# featurization_config.add_transformer_params('HashOneHotEncoder', [], {"number_of_bits": 3})
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb
The FeaturizationConfig defined in the code example above can then used in the configuration of an automated ML experiment as shown in the next code example.
automl_settings = {
"enable_early_stopping": True,
"experiment_timeout_hours": 0.25,
"max_concurrent_iterations": 4,
"max_cores_per_iteration": -1,
"n_cross_validations": 5,
"primary_metric": "normalized_root_mean_squared_error",
"verbosity": logging.INFO,
}
automl_config = AutoMLConfig(
task="regression",
debug_log="automl_errors.log",
compute_target=compute_target,
featurization=featurization_config,
training_data=train_data,
label_column_name=label,
**automl_settings,
)
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb
Methods
add_blocked_transformers |
Add transformers to be blocked. |
add_column_purpose |
Add a feature type for the specified column. |
add_drop_columns |
Add column name or list of column names to ignore. |
add_prediction_transform_type |
Add a prediction transform type for target column. PredictionTransformTypes class. :type prediction_transform_type: str |
add_transformer_params |
Add customized transformer parameters to the list of custom transformer parameters. Apply to all columns if column list is empty. |
get_transformer_params |
Retrieve transformer customization parameters for columns. |
remove_column_purpose |
Remove the feature type for the specified column. If no feature is specified for a column, the detected default feature is used. |
remove_prediction_transform_type |
Revert the prediction transform type to default for target column. |
remove_transformer_params |
Remove transformer customization parameters for specific column or all columns. |
add_blocked_transformers
Add transformers to be blocked.
add_blocked_transformers(transformers: str | List[str]) -> None
Parameters
Name | Description |
---|---|
transformers
Required
|
A transformer name or list of transformer names. Transformer names must be one of the transformers listed in the BLOCKED_TRANSFORMERS attribute of the SupportedTransformers class. |
add_column_purpose
Add a feature type for the specified column.
add_column_purpose(column_name: str, feature_type: str) -> None
Parameters
Name | Description |
---|---|
column_name
Required
|
A column name to update. |
feature_type
Required
|
A feature type to use for the column. Feature types must be one given in the FULL_SET attribute of the FeatureType class. |
add_drop_columns
Add column name or list of column names to ignore.
add_drop_columns(drop_columns: str | List[str]) -> None
Parameters
Name | Description |
---|---|
drop_columns
Required
|
A column name or list of column names. |
add_prediction_transform_type
Add a prediction transform type for target column.
PredictionTransformTypes class. :type prediction_transform_type: str
add_prediction_transform_type(prediction_transform_type: str) -> None
Parameters
Name | Description |
---|---|
prediction_transform_type
Required
|
A prediction transform type to be used for casting target column. Feature types must be one given in the FULL_SET attribute of the |
add_transformer_params
Add customized transformer parameters to the list of custom transformer parameters.
Apply to all columns if column list is empty.
add_transformer_params(transformer: str, cols: List[str], params: Dict[str, Any]) -> None
Parameters
Name | Description |
---|---|
transformer
Required
|
The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class. |
cols
Required
|
Input columns for specified transformer. Some transformers can take multiple columns as input specified as a list. |
params
Required
|
A dictionary of keywords and arguments. |
Remarks
The following code example shows how to customize featurization in automated ML for forecasting. In the example code, dropping a column and adding transform parameters are shown.
featurization_config = FeaturizationConfig()
# Force the CPWVOL5 feature to be numeric type.
featurization_config.add_column_purpose("CPWVOL5", "Numeric")
# Fill missing values in the target column, Quantity, with zeros.
featurization_config.add_transformer_params(
"Imputer", ["Quantity"], {"strategy": "constant", "fill_value": 0}
)
# Fill missing values in the INCOME column with median value.
featurization_config.add_transformer_params(
"Imputer", ["INCOME"], {"strategy": "median"}
)
# Fill missing values in the Price column with forward fill (last value carried forward).
featurization_config.add_transformer_params("Imputer", ["Price"], {"strategy": "ffill"})
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb
get_transformer_params
Retrieve transformer customization parameters for columns.
get_transformer_params(transformer: str, cols: List[str]) -> Dict[str, Any]
Parameters
Name | Description |
---|---|
transformer
Required
|
The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class. |
cols
Required
|
The columns names to get information for. Use an empty list to specify all columns. |
Returns
Type | Description |
---|---|
Transformer parameter settings. |
remove_column_purpose
Remove the feature type for the specified column.
If no feature is specified for a column, the detected default feature is used.
remove_column_purpose(column_name: str) -> None
Parameters
Name | Description |
---|---|
column_name
Required
|
The column name to update. |
remove_prediction_transform_type
Revert the prediction transform type to default for target column.
remove_prediction_transform_type() -> None
remove_transformer_params
Remove transformer customization parameters for specific column or all columns.
remove_transformer_params(transformer: str, cols: List[str] | None = None) -> None
Parameters
Name | Description |
---|---|
transformer
Required
|
The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class. |
cols
|
The columns names to remove customization parameters from. Specify None (the default) to remove all customization params for the specified transformer. Default value: None
|