An Apache Spark-based analytics platform optimized for Azure.
Thank you for reaching out — this is a very relevant scenario, and we often see similar requirements when syncing data across Azure SQL Databases in different Entra tenants.
For a near real-time / continuous synchronization, the most suitable and scalable approach is to build a Change Data Capture (CDC)-based incremental pipeline using Azure Databricks.
- Enable Change Tracking or CDC on the source database
Start by enabling Change Tracking (lightweight) or Change Data Capture (CDC) on the source tables.
- Change Tracking helps identify which rows have changed
- CDC captures full insert/update/delete history, which is better suited for streaming scenarios
- Capture and stage changes in ADLS Gen2
Use an ingestion mechanism (for example, incremental extraction or CDC tools) to land:
- An initial full snapshot
- Ongoing change data
into Azure Data Lake Storage Gen2. This provides a reliable staging layer for downstream processing.
- Process changes using Azure Databricks
In Databricks, you can use:
- Structured Streaming for continuous micro-batch processing
- Or AUTO CDC pipelines to simplify change handling
These capabilities are designed to process data continuously and keep downstream systems in sync.
- Apply incremental changes to the target Azure SQL Database
Use merge/upsert logic to apply:
- Inserts
- Updates
- Deletes
This ensures the target database remains aligned with the source.
- Configure secure cross-tenant connectivity
Since this is a cross-tenant setup, make sure to configure:
- Microsoft Entra ID (Azure AD) service principal authentication
- Appropriate networking (Private Endpoints or firewall rules)
You could consider Azure SQL Data Sync, which allows synchronization between Azure SQL databases (including cross-tenant scenarios).
However:
- It operates on scheduled intervals (~5 minutes or more)
- Has limited flexibility for transformation scenarios
- Planned for retirement by September 30, 2027, so not recommended for new designs
For your requirement (cross-tenant + near real-time + continuous sync), I recommend:
Using Change Tracking / CDC for incremental capture
Processing with Azure Databricks (Structured Streaming or AUTO CDC)
Applying merge/upsert logic to the target Azure SQL Database
Since this involves architecture decisions (cross-tenant networking, security, and design patterns), I would also recommend engaging your Solution Architect (SA) to review and validate the end-to-end architecture based on your organization’s setup and constraints.
Reference:
Query streaming data in Azure Databricks Replicate external database using AUTO CDC in Databricks
What is SQL Data Sync SQL Data Sync retirement guidance
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.