FeaturizationConfig Class

Defines feature engineering configuration for automated machine learning experiments in Azure Machine Learning.

Use the FeaturizationConfig class in the featurization parameter of the AutoMLConfig class. For more information, see Configure automated ML experiments.

Create a FeaturizationConfig.

Inheritance
builtins.object
FeaturizationConfig

Constructor

FeaturizationConfig(blocked_transformers: List[str] | None = None, column_purposes: Dict[str, str] | None = None, transformer_params: Dict[str, List[Tuple[List[str], Dict[str, Any]]]] | None = None, drop_columns: List[str] | None = None, dataset_language: str | None = None, prediction_transform_type: str | None = None)

Parameters

Name Description
blocked_transformers

A list of transformer names to be blocked during featurization.

Default value: None
column_purposes

A dictionary of column names and feature types used to update column purpose.

Default value: None
transformer_params

A dictionary of transformer and corresponding customization parameters.

Default value: None
drop_columns

A list of columns to be ignored in the featurization process. This setting is being deprecated. Please drop columns from your datasets as part of your data preparation process before providing the datasets to AutoML.

Default value: None
prediction_transform_type
str

A str of target transform type to be used to cast target column type.

Default value: None
blocked_transformers
Required

A list of transformer names to be blocked during featurization.

column_purposes
Required

A dictionary of column names and feature types used to update column purpose.

transformer_params
Required

A dictionary of transformer and corresponding customization parameters.

drop_columns
Required

A list of columns to be ignored in the featurization process. This setting is being deprecated. Please drop columns from your datasets as part of your data preparation process before providing the datasets to AutoML.

dataset_language
str

Three character ISO 639-3 code for the language(s) contained in the dataset. Languages other than English are only supported if you use GPU-enabled compute. The langugage_code 'mul' should be used if the dataset contains multiple languages. To find ISO 639-3 codes for different languages, please refer to https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes.

Default value: None
prediction_transform_type
Required
str

A str of target transform type to be used to cast target column type.

Remarks

Featurization customization has methods that allow you to:

  • Add or remove column purpose. With the add_column_purpose and remove_column_purpose methods you can override the feature type for specified columns, for example, when the feature type of column does not correctly reflect its purpose. The add method supports adding all the feature types given in the FULL_SET attribute of the FeatureType class.

  • Add or remove transformer parameters. With the add_transformer_params and remove_transformer_params methods you can change the parameters of customizable transformers like Imputer, HashOneHotEncoder, and TfIdf. Customizable transformers are listed in the SupportedTransformers class CUSTOMIZABLE_TRANSFORMERS attribute. Use the get_transformer_params to lookup customization parameters.

  • Block transformers. Block transformers to be used for the featurization process with the add_blocked_transformers method. The transformers must be one of the transformers listed in the SupportedTransformers class BLOCKED_TRANSFORMERS attribute.

  • Add a drop column to ignore for featurization and training with the add_drop_columns method. For example, you can drop a column that doesn't contain useful information.

  • Add or remove prediction transform type. With add_prediction_transform_type and

remove_prediction_transform_type methods you can override the existing target column type. Prediction transform types are listed in the PredictionTransformTypes attribute.

The following code example shows how to customize featurization in automated ML for forecasting. In the example code, dropping a column and adding transform parameters are shown.


   featurization_config = FeaturizationConfig()
   # Force the CPWVOL5 feature to be numeric type.
   featurization_config.add_column_purpose("CPWVOL5", "Numeric")
   # Fill missing values in the target column, Quantity, with zeros.
   featurization_config.add_transformer_params(
       "Imputer", ["Quantity"], {"strategy": "constant", "fill_value": 0}
   )
   # Fill missing values in the INCOME column with median value.
   featurization_config.add_transformer_params(
       "Imputer", ["INCOME"], {"strategy": "median"}
   )
   # Fill missing values in the Price column with forward fill (last value carried forward).
   featurization_config.add_transformer_params("Imputer", ["Price"], {"strategy": "ffill"})

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb

The next example shows customizing featurization in a regression problem using the Hardware Performance Dataset. In the example code, a blocked transformer is defined, column purposes are added, and transformer parameters are added.


   featurization_config = FeaturizationConfig()
   featurization_config.blocked_transformers = ["LabelEncoder"]
   # featurization_config.drop_columns = ['MMIN']
   featurization_config.add_column_purpose("MYCT", "Numeric")
   featurization_config.add_column_purpose("VendorName", "CategoricalHash")
   # default strategy mean, add transformer param for for 3 columns
   featurization_config.add_transformer_params("Imputer", ["CACH"], {"strategy": "median"})
   featurization_config.add_transformer_params(
       "Imputer", ["CHMIN"], {"strategy": "median"}
   )
   featurization_config.add_transformer_params(
       "Imputer", ["PRP"], {"strategy": "most_frequent"}
   )
   # featurization_config.add_transformer_params('HashOneHotEncoder', [], {"number_of_bits": 3})

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb

The FeaturizationConfig defined in the code example above can then used in the configuration of an automated ML experiment as shown in the next code example.


   automl_settings = {
       "enable_early_stopping": True,
       "experiment_timeout_hours": 0.25,
       "max_concurrent_iterations": 4,
       "max_cores_per_iteration": -1,
       "n_cross_validations": 5,
       "primary_metric": "normalized_root_mean_squared_error",
       "verbosity": logging.INFO,
   }

   automl_config = AutoMLConfig(
       task="regression",
       debug_log="automl_errors.log",
       compute_target=compute_target,
       featurization=featurization_config,
       training_data=train_data,
       label_column_name=label,
       **automl_settings,
   )

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb

Methods

add_blocked_transformers

Add transformers to be blocked.

add_column_purpose

Add a feature type for the specified column.

add_drop_columns

Add column name or list of column names to ignore.

add_prediction_transform_type

Add a prediction transform type for target column.

PredictionTransformTypes class. :type prediction_transform_type: str

add_transformer_params

Add customized transformer parameters to the list of custom transformer parameters.

Apply to all columns if column list is empty.

get_transformer_params

Retrieve transformer customization parameters for columns.

remove_column_purpose

Remove the feature type for the specified column.

If no feature is specified for a column, the detected default feature is used.

remove_prediction_transform_type

Revert the prediction transform type to default for target column.

remove_transformer_params

Remove transformer customization parameters for specific column or all columns.

add_blocked_transformers

Add transformers to be blocked.

add_blocked_transformers(transformers: str | List[str]) -> None

Parameters

Name Description
transformers
Required
str or list[str]

A transformer name or list of transformer names. Transformer names must be one of the transformers listed in the BLOCKED_TRANSFORMERS attribute of the SupportedTransformers class.

add_column_purpose

Add a feature type for the specified column.

add_column_purpose(column_name: str, feature_type: str) -> None

Parameters

Name Description
column_name
Required
str

A column name to update.

feature_type
Required

A feature type to use for the column. Feature types must be one given in the FULL_SET attribute of the FeatureType class.

add_drop_columns

Add column name or list of column names to ignore.

add_drop_columns(drop_columns: str | List[str]) -> None

Parameters

Name Description
drop_columns
Required
str or list[str]

A column name or list of column names.

add_prediction_transform_type

Add a prediction transform type for target column.

PredictionTransformTypes class. :type prediction_transform_type: str

add_prediction_transform_type(prediction_transform_type: str) -> None

Parameters

Name Description
prediction_transform_type
Required

A prediction transform type to be used for casting target column. Feature types must be one given in the FULL_SET attribute of the

add_transformer_params

Add customized transformer parameters to the list of custom transformer parameters.

Apply to all columns if column list is empty.

add_transformer_params(transformer: str, cols: List[str], params: Dict[str, Any]) -> None

Parameters

Name Description
transformer
Required
str

The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class.

cols
Required

Input columns for specified transformer. Some transformers can take multiple columns as input specified as a list.

params
Required

A dictionary of keywords and arguments.

Remarks

The following code example shows how to customize featurization in automated ML for forecasting. In the example code, dropping a column and adding transform parameters are shown.


   featurization_config = FeaturizationConfig()
   # Force the CPWVOL5 feature to be numeric type.
   featurization_config.add_column_purpose("CPWVOL5", "Numeric")
   # Fill missing values in the target column, Quantity, with zeros.
   featurization_config.add_transformer_params(
       "Imputer", ["Quantity"], {"strategy": "constant", "fill_value": 0}
   )
   # Fill missing values in the INCOME column with median value.
   featurization_config.add_transformer_params(
       "Imputer", ["INCOME"], {"strategy": "median"}
   )
   # Fill missing values in the Price column with forward fill (last value carried forward).
   featurization_config.add_transformer_params("Imputer", ["Price"], {"strategy": "ffill"})

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb

get_transformer_params

Retrieve transformer customization parameters for columns.

get_transformer_params(transformer: str, cols: List[str]) -> Dict[str, Any]

Parameters

Name Description
transformer
Required
str

The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class.

cols
Required

The columns names to get information for. Use an empty list to specify all columns.

Returns

Type Description

Transformer parameter settings.

remove_column_purpose

Remove the feature type for the specified column.

If no feature is specified for a column, the detected default feature is used.

remove_column_purpose(column_name: str) -> None

Parameters

Name Description
column_name
Required
str

The column name to update.

remove_prediction_transform_type

Revert the prediction transform type to default for target column.

remove_prediction_transform_type() -> None

remove_transformer_params

Remove transformer customization parameters for specific column or all columns.

remove_transformer_params(transformer: str, cols: List[str] | None = None) -> None

Parameters

Name Description
transformer
Required
str

The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class.

cols
list[str] or None

The columns names to remove customization parameters from. Specify None (the default) to remove all customization params for the specified transformer.

Default value: None

Attributes

blocked_transformers

column_purposes

dataset_language

drop_columns

prediction_transform_type

transformer_params