Share via

Continuous Cross-Tenant Azure SQL Database Synchronization Using Azure Databricks

Mallikarjun appani 6 Reputation points
2026-05-18T09:50:56.9866667+00:00

Dear Microsoft Support Team,

We are working on a requirement to implement a continuous data synchronization pipeline between two Azure SQL Database environments hosted in different Microsoft Entra tenants using Azure Databricks.

Requirement Overivew:

We need to establish a near real-time/continuous sink mechanism where data changes from a Source Azure SQL Database in one tenant are continuously replicated or synchronized to a Target Azure SQL Database in another tenant through Azure Databricks.
Could you please suggest beest way to do the above requirement.

Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.

0 comments No comments

2 answers

Sort by: Most helpful
  1. SAI JAGADEESH KUDIPUDI 3,210 Reputation points Microsoft External Staff Moderator
    2026-05-19T00:05:55.46+00:00

    Hi Mallikarjun appani,

    Thank you for reaching out — this is a very relevant scenario, and we often see similar requirements when syncing data across Azure SQL Databases in different Entra tenants.

    For a near real-time / continuous synchronization, the most suitable and scalable approach is to build a Change Data Capture (CDC)-based incremental pipeline using Azure Databricks.

    1. Enable Change Tracking or CDC on the source database
      Start by enabling Change Tracking (lightweight) or Change Data Capture (CDC) on the source tables.
    • Change Tracking helps identify which rows have changed
    • CDC captures full insert/update/delete history, which is better suited for streaming scenarios
    1. Capture and stage changes in ADLS Gen2
      Use an ingestion mechanism (for example, incremental extraction or CDC tools) to land:
    • An initial full snapshot
    • Ongoing change data

    into Azure Data Lake Storage Gen2. This provides a reliable staging layer for downstream processing.

    1. Process changes using Azure Databricks
      In Databricks, you can use:
    • Structured Streaming for continuous micro-batch processing
    • Or AUTO CDC pipelines to simplify change handling

    These capabilities are designed to process data continuously and keep downstream systems in sync.

    1. Apply incremental changes to the target Azure SQL Database
      Use merge/upsert logic to apply:
    • Inserts
    • Updates
    • Deletes

    This ensures the target database remains aligned with the source.

    1. Configure secure cross-tenant connectivity
      Since this is a cross-tenant setup, make sure to configure:
    • Microsoft Entra ID (Azure AD) service principal authentication
    • Appropriate networking (Private Endpoints or firewall rules)

    You could consider Azure SQL Data Sync, which allows synchronization between Azure SQL databases (including cross-tenant scenarios).

    However:

    • It operates on scheduled intervals (~5 minutes or more)
    • Has limited flexibility for transformation scenarios
    • Planned for retirement by September 30, 2027, so not recommended for new designs

    For your requirement (cross-tenant + near real-time + continuous sync), I recommend:

    Using Change Tracking / CDC for incremental capture
    Processing with Azure Databricks (Structured Streaming or AUTO CDC)
    Applying merge/upsert logic to the target Azure SQL Database

    Since this involves architecture decisions (cross-tenant networking, security, and design patterns), I would also recommend engaging your Solution Architect (SA) to review and validate the end-to-end architecture based on your organization’s setup and constraints.
    Reference:
    Query streaming data in Azure Databricks Replicate external database using AUTO CDC in Databricks
    What is SQL Data Sync SQL Data Sync retirement guidance

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    Was this answer helpful?


  2. AI answer

    2026-05-18T09:51:10.31+00:00

    A suitable approach is to treat this as a cross-tenant, near real-time replication problem and use change data capture (CDC) into cloud storage plus Azure Databricks ingestion, then write to the target Azure SQL Database.

    A high-level architecture that aligns with the documented patterns is:

    1. Enable change capture on the source Azure SQL Database.
      • Use a CDC mechanism (for example, Debezium, Kafka, or another log-based CDC tool) to continuously capture inserts/updates/deletes from the source Azure SQL Database.
      • Land both an initial full snapshot and the ongoing change feed into a cloud storage account (for example, Azure Data Lake Storage) that Azure Databricks can access.
    2. Use Azure Databricks AUTO CDC pipelines to ingest from storage.
      • Follow the pattern described for replicating an external RDBMS table using AUTO CDC:
        • One-time full copy (once flow) from the snapshot dataset to initialize the target state.
        • Continuous change flow that reads the CDC feed from storage and applies changes.
      • This pattern is explicitly recommended for “keeping a target table in sync with an external system of record” and for slowly changing dimensions.
    3. Write from Azure Databricks into the target Azure SQL Database in the other tenant.
      • Configure cross-tenant access so that Databricks can authenticate to the target Azure SQL Database (for example, using a service principal in the target tenant and appropriate firewall/network rules).
      • In the AUTO CDC pipeline, define the sink as the target Azure SQL Database table and apply the incremental changes (upserts/deletes) as they are processed.
    4. Choose the appropriate sync mode for Databricks pipelines over the lakehouse layer (if using Lakehouse / Lakebase Provisioned synced tables):
      • Use Continuous sync mode when changes must appear in near real time. The pipeline runs once for the initial snapshot and then continuously applies incremental changes.
      • Ensure the source Delta tables in the lake (fed by AUTO CDC) have Change Data Feed enabled if using Triggered or Continuous modes.
    5. Handle cross-tenant specifics.
      • Cross-tenant synchronization is supported at the application level (Databricks + service principals) even though some PaaS sync services (like SQL Data Sync) have limitations across tenants or clouds.
      • Use separate identities and connections for source and target tenants, and manage secrets/credentials in a secure store (for example, Azure Key Vault) that Databricks can access.

    This design gives:

    • Near real-time replication (driven by the CDC feed and Continuous Databricks pipelines).
    • Decoupling between tenants via storage and Databricks.
    • Flexibility to extend the pipeline for transformations or additional targets.

    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.