Synapse Create Lake Database Table using a Parquet file from python

Question

Synapse Create Lake Database Table using a Parquet file from python

Will 0

I have a Lake Database in Synapse - Not a SQL DatabaseTo date I have been creating tables manually in the workspace GUI choosing "Create external table from data lake" and selecting a parquet file from my Azure Storage.I want to use Python to do this so I don't need to create tables manually.I have tried spark:

USE [testdb];
CREATE EXTERNAL TABLE testdb.firsttable 
(
  Date TIMESTAMP,
  Total_Cost DOUBLE
)
USING parquet
LOCATION 'abfss://....dfs.core.windows.net/data/firsttable.parquet'
)

Which executes correctly but does not actually update the database in my Synapse Workspace.I have also tried SQL scripts (Under the "Develop" menu) (with the hopes of replicating using jdbc later) but that fails with: Operation CREATE EXTERNAL TABLE is not allowed for a replicated database.

So whats the correct way to create an external table in my Lake Database from a parquet file in Python?

Venkat Reddy Navari 1,110 Reputation points Microsoft External Staff

2025-03-25T09:16:16.9666667+00:00

@Will Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

Venkat Reddy Navari 1,110 Reputation points Microsoft External Staff

2025-03-25T09:16:16.9666667+00:00

@Will Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Amira Bedhiafi 31,391

Hello Will !

Thank you for posting on Microsoft Learn.

You need to use spark.sql() and spark.catalog.setCurrentDatabase()

lake_db = "testdb"

parquet_path = "abfss://<container>@<storageaccount>.dfs.core.windows.net/data/firsttable.parquet"

spark.catalog.setCurrentDatabase(lake_db)

spark.sql(f"""
    CREATE TABLE IF NOT EXISTS firsttable
    USING PARQUET
    LOCATION '{parquet_path}'
""")

Lake Databases are Spark-managed; you must use the Spark engine, not T-SQL.

You must use spark.catalog.setCurrentDatabase() to make sure that the table is created in the desired Lake Database.

The CREATE EXTERNAL TABLE ... syntax you used is from the T-SQL side and won’t update the Spark-based Lake DB catalog, which is why you don’t see it in the workspace.

The table you create this way will appear in the Synapse Studio > Data > Lake databases > testdb section.

If you want additional benefits like ACID transactions, consider writing your table as a Delta Table instead of pure Parquet:

df = spark.read.parquet(parquet_path)
df.write.format("delta").save("abfss://.../data/deltatable")


spark.sql(f"""
    CREATE TABLE deltatable
    USING DELTA
    LOCATION 'abfss://.../data/deltatable'
""")

Will 0

Thanks for your detailed answer, that seems to be working partially however the GUI is not updating to reflect the tables that have been added? Do I need to switch Synapse to Live mode and or do something else to make these changes reflected in the GUI?

Here is my code and outputs:

lake_db = "testdb"
spark.catalog.setCurrentDatabase(lake_db)
tables = spark.sql(f"SHOW TABLES IN {lake_db}").collect()
print([tbl.tableName for tbl in tables])

Output: []

parquet_path = abs_path + "/Dev/test.parquet"

# check the parquet file exists
df = spark.read.parquet(parquet_path)

table_name = "test"

spark.catalog.setCurrentDatabase(lake_db)

spark.sql(f"""
    CREATE TABLE IF NOT EXISTS {table_name}
    USING PARQUET
    LOCATION '{parquet_path}'
""")

Output: DataFrame([])

delta_path = abs_path + "/Dev/deltatable"
df.write.format("delta").save(delta_path)

spark.catalog.setCurrentDatabase(lake_db)
spark.sql(f"""
    CREATE TABLE deltatable
    USING DELTA
    LOCATION '{delta_path}'
""")

Output: DataFrame([])

lake_db = "testdb"
spark.catalog.setCurrentDatabase(lake_db)
tables = spark.sql(f"SHOW TABLES IN {lake_db}").collect()
print([tbl.tableName for tbl in tables])

Output: ['test', 'deltatable']
^ this output remains the same even after restarting the Kernel

I have refreshed my testdb in the workspace many times but it still shows an empty lake database:
User's image

Once I've created my tables in Python I want to add relationships between them - both using Python and using the GUI.

Venkat Reddy Navari 1,110 Reputation points Microsoft External Staff

2025-03-26T11:41:34.88+00:00
Hi Will

It looks like your tables are being created, but Synapse Studio isn't showing them in the GUI. To ensure the tables are visible in the Lake Database section of Synapse Studio, try the following

Switch to Live Mode: In Synapse Studio, go to Data > Lake databases, and check if switching to Live Mode updates the view. Sometimes, the UI caches metadata and doesn't refresh automatically.

Manually Refresh the Metastore: You can force an update using:
spark.sql("MSCK REPAIR TABLE test")
This command helps synchronize the table metadata with the underlying storage.

Check the sys.tables Metadata Table:
SELECT * FROM sys.tables
If the tables aren’t listed, it means they haven’t been properly registered.

Manually Register the Tables – Sometimes, Spark creates tables but doesn’t fully register them in the metastore. You can try running
spark.sql(f"ALTER TABLE {lake_db}.{table_name} SET TBLPROPERTIES ('delta.feature.enabled' = 'true')")

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Venkat Reddy Navari 1,110 Reputation points Microsoft External Staff

2025-03-27T13:41:32.8566667+00:00

Hi Will Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Will 0 Reputation points

2025-04-14T11:06:05.0833333+00:00
Switching to live mode works but as soon as i switch back to git the changes are lost - i also cannot publish any changes while in live mode as i have git mode enabled.

for parquet files:
MSCK REPAIR TABLE only works on partitioned tables: `testdb`.`test
and for delta tables:
MSCK REPAIR TABLE is not supported for v2 tables.

While the command:

SELECT * FROM sys.tables

Does correctly list the created tables

Venkat Reddy Navari 1,110 Microsoft External Staff

@Will When working in Git mode, Synapse Studio only reflects the artifacts that are under source control (e.g., notebooks, SQL scripts, pipelines, lake database definitions created via GUI). Any changes made via code (e.g., Python/Spark) in Live mode — such as creating tables dynamically — are not tracked in Git, and therefore, those changes:

Do not appear in the GUI when switching back to Git mode.
Cannot be published, as the Git mode only tracks and publishes objects explicitly saved in Git-tracked assets.

This is by design — Git mode is intended to capture version-controlled changes done through the GUI or explicitly saved scripts/notebooks, not runtime Spark actions.

Behavior	Explanation
Tables visible in Live mode	Spark registers the tables in the Lake Database metastore
Tables visible in Live mode	Spark registers the tables in the Lake Database metastore
Tables disappear in Git mode	Git mode uses definitions stored in the Git repo, not what’s in the runtime metastore
`MSCK REPAIR TABLE` doesn't work	Expected: only works on partitioned Parquet, not Delta or unpartitioned tables
`SELECT * FROM sys.tables` shows the tables	This confirms the tables exist in the Spark metastore, not Git metadata

Workaround Options:

Persist Table Definitions Manually in Git Mode

If you want these tables to persist in Git mode:

Create the table definitions manually through the Synapse Studio GUI in Git mode using “Create external table from Data Lake”.

Or define them using a notebook or script file and check them into Git explicitly.

Spark-created tables during a session aren't version-controlled unless explicitly scripted and saved into Git.

Treat Runtime-Created Tables as Temporary or Environment-Specific

If you intend to create tables programmatically:

Use Live mode for exploratory and runtime operations.

Document or script the creation logic in notebooks and save to Git for reproducibility.

Limitation

You cannot publish runtime-created tables from Live mode to Git because there is no automatic sync from Spark’s metastore to Git-tracked artifacts.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Venkat Reddy Navari 1,110 Reputation points Microsoft External Staff

2025-04-16T10:55:20.0066667+00:00

@Will Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Will 0 Reputation points

2025-04-24T03:39:25.6033333+00:00

How can I check table modifications into git? Ideally it would be i switch to live mode run the script then commit the changes to git. This seems like a major limitation of using Git in Synapse - I'm better of never having used the feature at all as it breaks the UI....
Venkat Reddy Navari 1,110 Reputation points Microsoft External Staff

2025-04-24T11:13:44.38+00:00
@Will Currently, there is no supported way to sync or export tables created in Live mode back into Git in Azure Synapse. The only available options are to either manually recreate those tables in Git mode via the GUI or write the table creation logic in a notebook and check that into Git.

This limitation significantly impacts workflows that involve dynamic or automated table creation.

Workflow Steps:

Switch to Live mode.

Run Python/Spark code to create or modify tables.

Attempt to commit or sync those changes back into Git.

Key Limitations:

Tables created at runtime in Live mode do not appear in Git mode.

These tables cannot be published, as Git only tracks artifacts created via the GUI or saved notebooks/scripts.

Workarounds:

Manually recreate tables in Git mode through the GUI.

Write the table creation logic in a notebook and check that into Git.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Synapse Create Lake Database Table using a Parquet file from python

1 answer

Your answer