MLflow Tracing - generative AI observability

2025-07-10

MLflow Tracing is a powerful feature that provides end-to-end observability for gen AI applications, including complex agent-based systems. It records inputs, outputs, intermediate steps, and metadata to give you a complete picture of how your app behaves.

Tracing allows you to:

Debug and understand your application
Monitor performance and optimize cost
Evaluate and enhance application quality
Ensure auditability and compliance
Integrate tracing with many popular third-party frameworks

Tracing Gateway Video

Want to get started with tracing?

See the quickstarts to get hands-on:

Local IDE: Tracing quickstart	Databricks Notebook: Tracing quickstart

Here's how easy it is to get started with MLflow Tracing in just a few lines of code. mlflow.openai.autolog() automatically traces every OpenAI call in your application - no other code changes required:

import mlflow
import openai
import os # Added for environment variable configuration

# Configure Databricks Authentication (if running outside Databricks)
# If running in a Databricks notebook, these are not needed.
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-databricks-token"

# Configure OpenAI API Key (replace with your actual key)
# os.environ["OPENAI_API_KEY"] = "your-api-key-here"

# Enable automatic tracing for OpenAI - that's it!
mlflow.openai.autolog()

# Set up MLflow tracking (if running outside Databricks)
# If running in a Databricks notebook, these are not needed.
# mlflow.set_tracking_uri("databricks")
# mlflow.set_experiment("/Shared/my-genai-app")

# Your existing OpenAI code works without any changes
client = openai.OpenAI()
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[{"role": "user", "content": "Explain MLflow Tracing"}],
  max_tokens=100
)

# Traces are automatically captured and logged to MLflow!

Tip

Databricks recommends using Mosaic AI Gateway to securely manage API keys for AI endpoints. AI Gateway also includes tools like rate limiting, fallbacks, and AI guardrails. See Secure API Key Management.

Debug and understand your application

MLflow Tracing provides deep insights into your application's behavior, facilitating a complete debugging experience across different environments. By capturing the complete request-response cycle (Input/Output Tracking) and the execution flow, you can visualize and understand your application's logic and decision-making process.

Examining the inputs, outputs, and metadata for each intermediate step (e.g., retrieval, tool calls, LLM interactions) and associated user feedback or the results of quality evaluations allows you to:

In Development: Get detailed visibility into what happens beneath the abstractions of GenAI libraries, helping you precisely identify where issues or unexpected behaviors occur.
In Production: Monitor and debug issues in real-time. Traces capture errors and can include operational metrics like latency at each step, aiding in quick diagnostics.

MLflow Tracing offers a unified experience between development and production: you instrument your application once, and tracing works consistently in both environments. This allows you to navigate traces seamlessly within your preferred environment—be it your IDE, notebook, or production monitoring dashboard—eliminating the hassle of switching between multiple tools or searching through overwhelming logs.

Tracing Error Screenshot

Monitor performance and optimize costs

Understanding and optimizing the performance and cost of your GenAI applications is crucial. MLflow Tracing enables you to capture and monitor key operational metrics such as latency, cost (e.g., token usage), and resource utilization at each step of your application's execution.

This allows you to:

Track and identify performance bottlenecks within complex pipelines.
Monitor resource utilization to ensure efficient operation.
Optimize cost efficiency by understanding where resources or tokens are consumed.
Identify areas for performance improvement in your code or model interactions.

Furthermore, MLflow Tracing is compatible with OpenTelemetry, an industry-standard observability specification. This compatibility allows you to export your trace data to various services in your existing observability stack. See Export Traces to Other Services for more details.

Evaluate and enhance application quality

Systematically assessing and improving the quality of your GenAI applications is a core challenge. MLflow Tracing helps by allowing you to attach and track user feedback and the results of quality evaluations (from LLM judges or custom metrics) directly to your traces.

This enables comprehensive quality assessment throughout your application's lifecycle:

During Development: Evaluate traces using human reviewers or LLM judges to:
- Measure accuracy, relevance, and other quality aspects.
- Track quality improvements as you iterate on prompts, models, or retrieval strategies.
- Identify patterns in quality issues (e.g., specific types of queries that lead to poor responses).
- Make data-driven improvements to your application.
In Production: Monitor and assess quality in real-time by:
- Tracking quality metrics (derived from user feedback and evaluation results) across deployments.
- Identifying sudden quality degradation or regressions.
- Triggering alerts for critical quality issues.
- Helping maintain quality Service Level Agreements (SLAs).

Traces from both evaluation runs and production monitoring can be explored to identify root causes of quality issues—for instance, insufficiently retrieved documents in a RAG system or degraded performance of a specific model. Traces empower you to analyze these issues in detail and iterate quickly.

Moreover, traces are invaluable for building high-quality evaluation datasets. By capturing real user interactions and their outcomes, you can:

Curate representative test cases based on actual usage patterns.
Build comprehensive evaluation sets that cover diverse scenarios.
Use this data to fine-tune models or improve retrieval mechanisms.

When combined with MLflow LLM Evaluation, MLflow offers a seamless experience for assessing and improving your application's quality.

Tracing Assessments

Ensure auditability and compliance

MLflow Tracing enables you to capture every execution of your application, creating a detailed audit trail of how every output was generated. This is essential for maintaining transparency, accountability, and compliance in your GenAI applications.

With complete visibility into the execution flow—including all inputs, outputs, intermediate steps, and parameters used—you can:

Track and verify the origins of all outputs.
Provide evidence for compliance requirements.
Enable thorough post-hoc analysis of your application's behavior for specific requests.
Debug historical issues by examining past traces.

This comprehensive logging ensures that you have the necessary records for internal audits or external regulatory needs.

Trace params

Broad framework support and extensibility

MLflow Tracing is designed to fit into your existing GenAI development workflow with minimal friction. It integrates with 20+ popular GenAI libraries and frameworks out-of-the-box, including OpenAI, LangChain, LlamaIndex, DSPy, Anthropic, and more. For many of these, tracing can be enabled with a single line of code (e.g., mlflow.openai.autolog()).

See the Automatic Tracing section and the Integrations page for the full list of supported libraries and how to use them.

This broad support means you can gain observability without significant code changes, leveraging the tools you already use. For custom components or unsupported libraries, MLflow also provides powerful manual tracing APIs.

MLflow versioning recommendations

While tracing features are available in MLflow 2.15.0+, it is strongly recommended to install MLflow 3 (specifically 3.1 or newer if using mlflow[databricks]) for the latest gen AI capabilities.

For production environments that only need tracing, consider the mlflow-tracing package. For development and experimentation using Databricks, use mlflow[databricks]>=3.1.

Next steps

Continue your journey with these recommended actions and tutorials.

Instrument your app - Choose between automatic and manual tracing approaches
Debug & observe your app - Learn to use traces for debugging and monitoring
Use traces to improve quality - Create evaluation datasets and identify improvement opportunities

Reference guides

Explore detailed documentation for concepts and features mentioned in this guide.

Tracing concepts - Understand the fundamentals of MLflow Tracing
Tracing data model - Learn about traces, spans, and attributes
Query traces - Explore programmatic access to trace data