will azure purview fetch the lineage if system.access.table_lineage in Untiy catalog have NULL entity run id

Question

will azure purview fetch the lineage if system.access.table_lineage in Untiy catalog have NULL entity run id

Priya Yadav 20

Hi All ,

We have a job in Azure databricks and we are using dbutils.notebook.run to execute the notebooks. Post to that we have observed that the system.access.table_lineage table will have the lineage entry but the entity_run_id is NULL.

And I understand Purview reads these to fetch lineage.

Please help me understanding whether Purview will fetch the lineage from Azure databricks Unity catalog if system.access.table_lineage tables has entity_run_id as NULL.

Accepted answer

0 additional answers

Your answer

Answer 1

Smaran Thoomu 22,505 Microsoft External Staff

Hi @Priya Yadav

Thanks for bringing this up. You’re right - Azure Purview relies on the entity_run_id in the system.access.table_lineage table to properly trace data lineage from Databricks notebooks.

If entity_run_id is NULL, Purview won't be able to associate that lineage information with a specific run, which means the lineage graph in Purview may be incomplete or not show up at all. This is a known limitation when using dbutils.notebook.run, as it doesn’t always pass the execution context fully, especially in chained notebook scenarios.

To capture complete lineage, it's generally recommended to use Databricks Jobs with task-based orchestration instead. This ensures that the lineage metadata is recorded properly, including the entity_run_id, and improves the visibility in Purview.

Hope this helps clarify things - happy to dive deeper into your setup if needed.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Priya Yadav 20 Reputation points

2025-04-23T12:00:25.63+00:00

Thank you for the quick response.

Can you elaborate it further -" it's generally recommended to use Databricks Jobs with task-based orchestration instead".

Thanks
Smaran Thoomu 22,505 Reputation points Microsoft External Staff

2025-04-23T12:15:43.6633333+00:00
@Priya Yadav When I mentioned “Databricks Jobs with task-based orchestration,” I was referring to creating a multi-task job in Databricks, where each notebook or task is defined explicitly within a job workflow. This is different from chaining notebooks using dbutils.notebook.run, which doesn’t always track lineage context reliably.

Here’s why using task-based jobs is better for lineage capture:

Each task in a job gets a unique run_id and execution context, which is what Purview relies on to track data movement and generate lineage.

Databricks Jobs API logs execution metadata more consistently compared to ad hoc or nested notebook runs.

If your job orchestrates multiple notebooks or scripts, defining them as separate tasks within a single job helps retain a complete lineage trail.

In short: Instead of calling Notebook B from Notebook A using dbutils.notebook.run, it's better to define both A and B as separate tasks under one Databricks Job.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Share via

will azure purview fetch the lineage if system.access.table_lineage in Untiy catalog have NULL entity run id

0 additional answers

Your answer