Machine learning modules in ML Studio (classic) modules
Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.
- See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
- Learn more about Azure Machine Learning.
ML Studio (classic) documentation is being retired and may not be updated in the future.
The typical workflow for machine learning includes many phases:
Identifying a problem to solve and a metric for measuring results.
Finding, cleaning, and preparing appropriate data.
Identifying the best features and engineering new features.
Building, evaluating, and tuning models.
Using models to generate predictions, recommendations, and other results.
The modules in this section provide tools for the final phases of machine learning, in which you apply an algorithm to data to train a model. In these final phases, you also generate scores, and then evaluate the accuracy and usefulness of the model.
Note
Applies to: Machine Learning Studio (classic) only
Similar drag-and-drop modules are available in Azure Machine Learning designer.
List of machine learning tasks by category
-
Choose from a variety of customizable machine learning algorithms, including clustering, regression, classification, and anomaly detection models.
-
Provide your data to the configured model to learn from patterns and create statistics that can be used for predictions.
-
Create predictions using the trained models.
-
Measure the accuracy of a trained model, or compare multiple models.
For a detailed description of this experimental workflow, see the credit risk solution walkthrough.
Prerequisites
Before you can get to the fun part of building a model, typically a lot of preparation is required. This section provides links to tools in Machine Learning Studio (classic) that can help you clean up your data, improve the quality of input, and prevent run-time errors.
Data exploration and data quality
Ensure that your data is the right kind of data, the right quantity, and the right quality for the algorithm you’ve chosen. Understand how much data you have, and how it is distributed. Are there outliers? How were those generated, and what do they mean? Are there any duplicate records?
Handle missing values
Missing values can affect your results in many ways. For example, almost all statistical methods discard cases with missing values. By default, Machine Learning follows these rules when it encounters rows with missing values:
If data used to train a model has missing values, any rows with missing values are skipped.
If data used as input when scoring against a model has missing values, the missing values are used as inputs, but nulls are propagated. This usually means that a null is inserted in the results instead of a valid prediction.
Be sure to check your data before training your model. To impute the missing values or correct your data, use this module:
Select features and reduce dimensionality
Machine Learning Studio (classic) can help you sift through your data to find the most useful attributes.
Use tools such as Fisher Linear Discriminant Analysis or Filter Based Feature Selection to determine which columns of data have the most predictive power. These tools can also identify columns that should be removed because of data leakage.
Create or engineer features from existing data. Normalize data or group data into bins to make new groupings of data, or standardize the range of numeric values prior to analysis.
Reduce dimensionality by grouping categorical values, by using principal component analysis, or by sampling.
Examples
For examples of machine learning in action, see the Azure AI Gallery.
For tips, and a walkthrough of some typical data prepration tasks, see Walkthroughs executing the Team Data Science Process.