Share via

Azure Databricks CI/CD Pipeline using Databricks Asset Bundles (DAB) + Azure DevOps — Branching Strategy & Deployment Flow

Shubhangi Nannware 120 Reputation points
2026-05-21T18:17:27.21+00:00

We have Dev and Prod Environment only
Problem Statement:
I am setting up a CI/CD pipeline for Azure Databricks using Databricks Asset Bundles (DAB) and Azure DevOps Pipelines. I have created the bundle locally and pushed it to a feature branch (feature_databricks_bundle) in Azure DevOps. Currently, the repo has only a main branch and short-lived feature branches. I want to validate my proposed branching strategy and deployment flow before proceeding.

Questions:

Q1 — Branching Strategy Validation Is the following branching strategy correct for DAB-based CI/CD?

  • Create a permanent dev branch from main
  • Merge feature_databricks_bundle into dev (then delete the feature branch)
  • Developers create future feature branches from dev (not main)
  • On PR approval into dev → auto-deploy to Dev Databricks workspace
  • On PR approval into main → auto-deploy to Prod Databricks workspace

Is this the recommended approach, or should feature branches still be based off main?

Q2 — Auto-discovery of new Notebooks If a developer adds a new notebook to the repository under the bundle's configured src path, will DAB automatically include it in the next deployment without requiring any manual changes to databricks.yml? Or does each new notebook/job need to be explicitly declared in the bundle configuration?

Q3 — Azure DevOps YML Pipeline Triggers For the Azure DevOps .yml pipeline, what is the recommended way to configure branch-based triggers for multi-environment deployments?

Example setup I am thinking:

# Trigger deploy to Dev on merge into dev branch
trigger:
  branches:
    include:
      - dev

# Trigger deploy to Prod on merge into main
trigger:
  branches:
    include:
      - main

Is it better to have two separate pipeline YML files (one per environment) or a single pipeline with conditional stage execution based on the target branch?

Q4 — DAB Target Configuration In databricks.yml, I plan to define two targets like below. Is this the right way to map targets to workspaces and environments?

targets:
  dev:
    mode: development
    workspace:
      host: https://<dev-workspace>.azuredatabricks.net
  prod:
    mode: production
    workspace:
      host: https://<prod-workspace>.azuredatabricks.net

Should service principal authentication be configured at the target level, and if so, what is the recommended way to pass secrets securely via Azure DevOps pipeline variables?

Q5 — PR Gate: Deploy-before-Merge vs Merge-then-Deploy My current plan is:

  1. Developer raises PR → featuredev
  2. CI pipeline runs bundle deployment to Dev workspace as a PR validation gate
  3. If deployment succeeds, PR is approved and merged
  4. Merge into main triggers deployment to Prod
Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.


1 answer

Sort by: Most helpful
  1. AI answer

    2026-05-21T18:18:01.2+00:00

    Q1 — Branching strategy The proposed strategy is aligned with the recommended patterns, with a small adjustment:

    • Recommended flow from the Databricks CI/CD guidance:
      • Develop locally or in the workspace and deploy to a Databricks development workspace to test changes.
      • Create a feature branch to version control updates and regularly sync local/workspace changes.
      • When testing is finished, merge the feature branch into main.
      • CI/CD automatically deploys main to a staging workspace and runs automated tests.
      • When staging checks pass, CI/CD deploys main to production.
      This is explicitly described as the “simplest best practice” branching strategy for bundles.
    • For Azure DevOps more generally, the guidance is also to keep a simple strategy: main as the continuous development branch, plus release branches as needed, with CI triggered on every check-in to main and release branches.

    Given that, two viable options are:

    1. Simple (recommended for most teams):
      • Feature branches → merge into main.
      • CI on main deploys to Dev (and optionally to Staging → Prod via promotion).
      • Use environments/approvals to control promotion to Prod.
    2. Two long‑lived branches (what is proposed):
      • Feature branches from dev.
      • Merge to dev → deploy to Dev workspace.
      • Merge to main → deploy to Prod workspace.

    Both are compatible with Databricks bundles and Azure DevOps. The documentation, however, describes feature branches merging into main and then using CI/CD to move from staging to production, so feature branches based off main with environment-based promotion is closer to the documented best practice.

    Q2 — Auto-discovery of new notebooks From the bundles best practices and library-dependency guidance:

    • Bundles require resources (jobs, pipelines, libraries) to be defined in source and referenced in databricks.yml.
    • The recommendation is to “reference the uploaded compiled library in databricks.yml” and to define resources declaratively.

    Implication:

    • Adding a new notebook under the src path does not automatically create or wire up a new job/pipeline in the bundle.
    • Any new workflow (job, pipeline, etc.) must be explicitly declared in the bundle configuration so that databricks bundle validate and databricks bundle deploy know what to deploy.

    Q3 — Azure DevOps YAML pipeline triggers From the Azure DevOps + Databricks CI/CD guidance:

    • Pipelines are defined in YAML (azure-pipelines.yml) and can be customized per branch using the Git branch selector.
    • Best practice is not to do production work directly in main, and to use a dedicated branch (for example release) for production deployment.
    • Builds are typically triggered with every check-in to the relevant branch, and release pipelines or stages handle promotion across environments (Dev → QA → UAT → Staging → Prod).

    For multi-environment deployments with bundles, two common patterns are supported by the docs and DevOps guidance:

    1. Single multi-stage YAML pipeline
      • One azure-pipelines.yml with stages like Build, Deploy_Dev, Deploy_Prod.
      • Use trigger and/or condition on stages to run Dev on one branch and Prod on another, for example:
        • Trigger on both dev and main.
        • Deploy_Dev stage runs when Build.SourceBranch is refs/heads/dev.
        • Deploy_Prod stage runs when Build.SourceBranch is refs/heads/main.
      • This aligns with the “simple branching strategy” and “deploy multiple branches to different stages” guidance, where different branches feed different stages/environments.
    2. Separate YAML pipelines per environment
      • One YAML file bound to dev branch (deploys to Dev workspace).
      • Another YAML file bound to main (deploys to Prod workspace).
      • Azure DevOps supports customizing the build process per branch via the branch selector in the pipeline editor.

    Both are supported. The documentation leans toward a single pipeline with multiple stages and branch-based routing to stages when managing multiple environments, because it keeps the flow centralized and easier to reason about.

    Q4 — DAB target configuration The target configuration shown matches the documented pattern for bundles:

    targets:
      dev:
        mode: development
        workspace:
          host: https://<dev-workspace>.azuredatabricks.net
      prod:
        mode: production
        workspace:
          host: https://<prod-workspace>.azuredatabricks.net
    
    • Bundles explicitly support multiple targets (for example dev, prod) that map to different workspaces and modes.
    • The recommended workflow is to validate and deploy the bundle per target using databricks bundle validate and databricks bundle deploy for the appropriate target.

    For authentication:

    • The GitHub Actions example in the documentation uses a service principal and sets the DATABRICKS_TOKEN environment variable from a secret (SP_TOKEN).
    • The same pattern applies to Azure DevOps:
      • Use a service principal associated with the Databricks workspace.
      • Generate a Databricks access token for that principal.
      • Store the token as a secret variable in Azure DevOps (variable group or pipeline variable marked secret).
      • In the pipeline, set DATABRICKS_TOKEN (and any other required unified auth environment variables) from that secret before running bundle commands.

    Authentication details are typically not stored in databricks.yml targets; instead, targets define workspace hosts and modes, while credentials are injected at runtime via environment variables or service connections.

    Q5 — PR gate: deploy-before-merge vs merge-then-deploy From the documented CI/CD flow for bundles:

    • The recommended sequence is:
      • Develop and test in a development workspace.
      • Use feature branches and merge into main after testing.
      • CI/CD then deploys main to staging and, after tests, to production.

    This maps well to:

    • PR from feature → integration branch (dev or main).
    • CI validation on PR (build, databricks bundle validate, possibly a test deployment to a Dev workspace).
    • After PR approval and merge, the main pipeline deploys to the appropriate environment(s).

    The proposed plan:

    1. PR featuredev.
    2. CI pipeline deploys bundle to Dev workspace as a PR validation gate.
    3. If deployment succeeds, PR is approved and merged.
    4. Merge into main triggers deployment to Prod.

    This is compatible with the documented guidance, as long as:

    • The Dev deployment used as a gate is against a non-production workspace (which matches the “develop in dev workspace” recommendation).
    • Production deployment is only triggered from a stable branch (for example main or release) after tests and checks pass.

    The alternative, also supported by the docs, is to:

    • Run validation (including databricks bundle validate) on PR without deploying, then deploy to Dev only after merge.

    Both are valid; using a Dev deployment as a PR gate is a stricter form of validation and fits within the recommended CI/CD model for bundles.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.