The purpose of this maturity model is to help clarify the Machine Learning Operations (MLOps) principles and practices. The maturity model shows the continuous improvement in the creation and operation of a production level machine learning application environment. You can use it as a metric for establishing the progressive requirements needed to measure the maturity of a machine learning production environment and its associated processes.
Maturity model
The MLOps maturity model helps clarify the Development Operations (DevOps) principles and practices necessary to run a successful MLOps environment. It's intended to identify gaps in an existing organization's attempt to implement such an environment. It's also a way to show you how to grow your MLOps capability in increments rather than overwhelm you with the requirements of a fully mature environment. Use it as a guide to:
Estimate the scope of the work for new engagements.
Establish realistic success criteria.
Identify deliverables you'll hand over at the conclusion of the engagement.
As with most maturity models, the MLOps maturity model qualitatively assesses people/culture, processes/structures, and objects/technology. As the maturity level increases, the probability increases that incidents or errors will lead to improvements in the quality of the development and production processes.
The MLOps maturity model encompasses five levels of technical capability:
Level |
Description |
Highlights |
Technology |
0 |
No MLOps |
- Difficult to manage full machine learning model lifecycle
- The teams are disparate and releases are painful
- Most systems exist as "black boxes," little feedback during/post deployment
|
- Manual builds and deployments
- Manual testing of model and application
- No centralized tracking of model performance
- Training of model is manual
|
1 |
DevOps but no MLOps |
- Releases are less painful than No MLOps, but rely on Data Team for every new model
- Still limited feedback on how well a model performs in production
- Difficult to trace/reproduce results
|
- Automated builds
- Automated tests for application code
|
2 |
Automated Training |
- Training environment is fully managed and traceable
- Easy to reproduce model
- Releases are manual, but low friction
|
- Automated model training
- Centralized tracking of model training performance
- Model management
|
3 |
Automated Model Deployment |
- Releases are low friction and automatic
- Full traceability from deployment back to original data
- Entire environment managed: train > test > production
|
- Integrated A/B testing of model performance for deployment
- Automated tests for all code
- Centralized tracking of model training performance
|
4 |
Full MLOps Automated Operations |
- Full system automated and easily monitored
- Production systems are providing information on how to improve and, in some cases, automatically improve with new models
- Approaching a zero-downtime system
|
- Automated model training and testing
- Verbose, centralized metrics from deployed model
|
The tables that follow identify the detailed characteristics for that level of process maturity. The model will continue to evolve.
Level 0: No MLOps
People |
Model Creation |
Model Release |
Application Integration |
- Data scientists: siloed, not in regular communications with the larger team
- Data engineers (if exists): siloed, not in regular communications with the larger team
- Software engineers: siloed, receive model remotely from the other team members
|
- Data gathered manually
- Compute is likely not managed
- Experiments aren't predictably tracked
- End result might be a single model file manually handed off with inputs/outputs
|
- Manual process
- Scoring script might be manually created well after experiments, not version controlled
- Release handled by data scientist or data engineer alone
|
- Heavily reliant on data scientist expertise to implement
- Manual releases each time
|
Level 1: DevOps no MLOps
People |
Model Creation |
Model Release |
Application Integration |
- Data scientists: siloed, not in regular communications with the larger team
- Data engineers (if exists): siloed, not in regular communication with the larger team
- Software engineers: siloed, receive model remotely from the other team members
|
- Data pipeline gathers data automatically
- Compute is or isn't managed
- Experiments aren't predictably tracked
- End result might be a single model file manually handed off with inputs/outputs
|
- Manual process
- Scoring script might be manually created well after experiments, likely version controlled
- Is handed off to software engineers
|
- Basic integration tests exist for the model
- Heavily reliant on data scientist expertise to implement model
- Releases automated
- Application code has unit tests
|
Level 2: Automated Training
People |
Model Creation |
Model Release |
Application Integration |
- Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs
- Data engineers: Working with data scientists
- Software engineers: siloed, receive model remotely from the other team members
|
- Data pipeline gathers data automatically
- Compute managed
- Experiment results tracked
- Both training code and resulting models are version controlled
|
- Manual release
- Scoring script is version controlled with tests
- Release managed by Software engineering team
|
- Basic integration tests exist for the model
- Heavily reliant on data scientist expertise to implement model
- Application code has unit tests
|
Level 3: Automated Model Deployment
People |
Model Creation |
Model Release |
Application Integration |
- Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs
- Data engineers: Working with data scientists and software engineers to manage inputs/outputs
- Software engineers: Working with data engineers to automate model integration into application code
|
- Data pipeline gathers data automatically
- Compute managed
- Experiment results tracked
- Both training code and resulting models are version controlled
|
- Automatic release
- Scoring script is version controlled with tests
- Release managed by continuous delivery (CI/CD) pipeline
|
- Unit and integration tests for each model release
- Less reliant on data scientist expertise to implement model
- Application code has unit/integration tests
|
Level 4: Full MLOps Automated Retraining
People |
Model Creation |
Model Release |
Application Integration |
- Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs. Working with software engineers to identify markers for data engineers
- Data engineers: Working with data scientists and software engineers to manage inputs/outputs
- Software engineers: Working with data engineers to automate model integration into application code. Implementing post-deployment metrics gathering
|
- Data pipeline gathers data automatically
- Retraining triggered automatically based on production metrics
- Compute managed
- Experiment results tracked
- Both training code and resulting models are version controlled
|
- Automatic Release
- Scoring Script is version controlled with tests
- Release managed by continuous integration and CI/CD pipeline
|
- Unit and Integration tests for each model release
- Less reliant on data scientist expertise to implement model
- Application code has unit/integration tests
|
Next steps