Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The Review App is a web-based UI designed for collecting structured feedback from domain experts without requiring them to write code. Use it to gather insights that improve your GenAI app's quality and align LLM judges with business requirements.
Two ways to use Review App
Label existing traces
Ask experts to review existing interactions with your app to provide feedback and expectations.
Use this to:
- Understand what high-quality, correct responses look like for specific queries
- Collect input to align LLM judges with your business requirements
- Create evaluation datasets from production traces
Vibe check a pre-production app
To use vibe check mode, you must have your application deployed
Ask experts to chat with a deployed app and provide feedback on the app's responses in real-time.
Use this to:
- Get quick feedback on new app versions before deployment
- Test app behavior without impacting your production environment
- Validate improvements with domain experts
Mode comparison
Aspect | Label existing traces | Vibe check mode |
---|---|---|
Input source | Existing traces | Domain expert enters queries |
Output source | Existing traces | Live agent endpoint responses |
Custom labeling schema | ✅ Yes - define custom questions and criteria | ❌ No - uses fixed feedback questions |
Results stored in | MLflow Traces (inside a Labeling Session) | MLflow Traces |
Prerequisites
Install MLflow and required packages
pip install --upgrade "mlflow[databricks]>=3.1.0" openai "databricks-connect>=16.1"
Create an MLflow experiment by following the setup your environment quickstart.
For vibe check mode only: a deployed agent endpoint using Agent Framework
1. Labeling existing traces
Labeling existing traces allows you to collect structured feedback on traces you've already captured from production or development. This is ideal for building evaluation datasets, understanding quality patterns, and training custom LLM judges.
The process involves creating a labeling session, defining what feedback to collect, adding traces to review, and sharing with domain experts. For complete step-by-step instructions, see Label existing traces.
For detailed information about labeling sessions, schemas, and configuration options, see Labeling Sessions and Labeling Schemas.
2. Vibe check mode
Package your app using Agent Framework and deploy it using Agent Framework as a Model Serving endpoint.
Add the endpoint to your experiment's review app:
Note
The below example adds a Databricks hosted LLM to the review app. Replace the endpoint with your app's endpoint from step 1.
from mlflow.genai.labeling import get_review_app # Get review app for current MLflow experiment review_app = get_review_app() # Connect your deployed agent endpoint review_app.add_agent( agent_name="claude-sonnet", model_serving_endpoint="databricks-claude-3-7-sonnet", ) print(f"Share this URL: {review_app.url}/chat")
Domain experts can now chat with your app and provide immediate feedback.
Permissions model
For labeling existing traces
Domain experts need:
- Account access: Must be provisioned in your Databricks account, but do not need access to your workspace
- Experiment access: WRITE permission to the MLflow experiment
For vibe check mode
Domain experts need:
- Account access: Must be provisioned in your Databricks account, , but do not need access to your workspace
- Endpoint access: CAN_QUERY permission to the model serving endpoint
Setting up account access
For users without workspace access, account admins can:
- Use account-level SCIM provisioning to sync users from your identity provider
- Manually register users and groups in Databricks
See User and group management for details.
Content rendering
The Review App automatically renders different content types from your MLflow Trace:
- Retrieved documents: Documents within a
RETRIEVER
span are rendered for display - OpenAI format messages: Inputs and outputs of the MLflow Trace following OpenAI chat conversations are rendered:
outputs
that contain an OpenAI format ChatCompletions objectinputs
oroutputs
dicts that contain amessages
key with an array of OpenAI format chat messages- If the
messages
array contains OpenAI format tool calls, they are also rendered
- If the
- Dictionaries: Inputs and outputs of the MLflow Trace that are dicts are rendered as pretty-printed JSONs
Otherwise, the content of the input
and output
from the root span of each trace are used as the primary content for review.
Accessing feedback data
After experts provide feedback, the labels are stored in MLflow Traces in your Experiment. Use the Traces tab or Labeling Sessions tab to view the data.
Next Steps
- Label existing traces - Step-by-step guide to collect structured expert feedback
- Live app testing - Set up vibe check mode for pre-production testing
- Build evaluation datasets - Convert expert feedback into evaluation datasets