Share via


Review App

The Review App is a web-based UI designed for collecting structured feedback from domain experts without requiring them to write code. Use it to gather insights that improve your GenAI app's quality and align LLM judges with business requirements.

Review app preview hero image.

Two ways to use Review App

Label existing traces

Ask experts to review existing interactions with your app to provide feedback and expectations.

Use this to:

  • Understand what high-quality, correct responses look like for specific queries
  • Collect input to align LLM judges with your business requirements
  • Create evaluation datasets from production traces

Vibe check a pre-production app

To use vibe check mode, you must have your application deployed

Ask experts to chat with a deployed app and provide feedback on the app's responses in real-time.

Use this to:

  • Get quick feedback on new app versions before deployment
  • Test app behavior without impacting your production environment
  • Validate improvements with domain experts

Mode comparison

Aspect Label existing traces Vibe check mode
Input source Existing traces Domain expert enters queries
Output source Existing traces Live agent endpoint responses
Custom labeling schema ✅ Yes - define custom questions and criteria ❌ No - uses fixed feedback questions
Results stored in MLflow Traces (inside a Labeling Session) MLflow Traces

Prerequisites

  1. Install MLflow and required packages

    pip install --upgrade "mlflow[databricks]>=3.1.0" openai "databricks-connect>=16.1"
    
  2. Create an MLflow experiment by following the setup your environment quickstart.

  3. For vibe check mode only: a deployed agent endpoint using Agent Framework

1. Labeling existing traces

Labeling existing traces allows you to collect structured feedback on traces you've already captured from production or development. This is ideal for building evaluation datasets, understanding quality patterns, and training custom LLM judges.

The process involves creating a labeling session, defining what feedback to collect, adding traces to review, and sharing with domain experts. For complete step-by-step instructions, see Label existing traces.

For detailed information about labeling sessions, schemas, and configuration options, see Labeling Sessions and Labeling Schemas.

2. Vibe check mode

  1. Package your app using Agent Framework and deploy it using Agent Framework as a Model Serving endpoint.

  2. Add the endpoint to your experiment's review app:

    Note

    The below example adds a Databricks hosted LLM to the review app. Replace the endpoint with your app's endpoint from step 1.

    from mlflow.genai.labeling import get_review_app
    
    # Get review app for current MLflow experiment
    review_app = get_review_app()
    
    # Connect your deployed agent endpoint
    review_app.add_agent(
        agent_name="claude-sonnet",
        model_serving_endpoint="databricks-claude-3-7-sonnet",
    )
    
    print(f"Share this URL: {review_app.url}/chat")
    

Domain experts can now chat with your app and provide immediate feedback.

Permissions model

For labeling existing traces

Domain experts need:

  • Account access: Must be provisioned in your Databricks account, but do not need access to your workspace
  • Experiment access: WRITE permission to the MLflow experiment

For vibe check mode

Domain experts need:

  • Account access: Must be provisioned in your Databricks account, , but do not need access to your workspace
  • Endpoint access: CAN_QUERY permission to the model serving endpoint

Setting up account access

For users without workspace access, account admins can:

  • Use account-level SCIM provisioning to sync users from your identity provider
  • Manually register users and groups in Databricks

See User and group management for details.

Content rendering

The Review App automatically renders different content types from your MLflow Trace:

  • Retrieved documents: Documents within a RETRIEVER span are rendered for display
  • OpenAI format messages: Inputs and outputs of the MLflow Trace following OpenAI chat conversations are rendered:
  • Dictionaries: Inputs and outputs of the MLflow Trace that are dicts are rendered as pretty-printed JSONs

Otherwise, the content of the input and output from the root span of each trace are used as the primary content for review.

Accessing feedback data

After experts provide feedback, the labels are stored in MLflow Traces in your Experiment. Use the Traces tab or Labeling Sessions tab to view the data.

Next Steps