Synapse Create Lake Database Table using a Parquet file from python

Will 0 Reputation points
2025-03-23T18:04:12.6633333+00:00

I have a Lake Database in Synapse - Not a SQL DatabaseTo date I have been creating tables manually in the workspace GUI choosing "Create external table from data lake" and selecting a parquet file from my Azure Storage.I want to use Python to do this so I don't need to create tables manually.I have tried spark:

USE [testdb];
CREATE EXTERNAL TABLE testdb.firsttable 
(
  Date TIMESTAMP,
  Total_Cost DOUBLE
)
USING parquet
LOCATION 'abfss://....dfs.core.windows.net/data/firsttable.parquet'
)

Which executes correctly but does not actually update the database in my Synapse Workspace.I have also tried SQL scripts (Under the "Develop" menu) (with the hopes of replicating using jdbc later) but that fails with: Operation CREATE EXTERNAL TABLE is not allowed for a replicated database.

So whats the correct way to create an external table in my Lake Database from a parquet file in Python?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,306 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 31,391 Reputation points
    2025-03-23T21:16:15.3333333+00:00

    Hello Will !

    Thank you for posting on Microsoft Learn.

    You need to use spark.sql() and spark.catalog.setCurrentDatabase()

    lake_db = "testdb"
    
    parquet_path = "abfss://<container>@<storageaccount>.dfs.core.windows.net/data/firsttable.parquet"
    
    spark.catalog.setCurrentDatabase(lake_db)
    
    spark.sql(f"""
        CREATE TABLE IF NOT EXISTS firsttable
        USING PARQUET
        LOCATION '{parquet_path}'
    """)
    

    Lake Databases are Spark-managed; you must use the Spark engine, not T-SQL.

    You must use spark.catalog.setCurrentDatabase() to make sure that the table is created in the desired Lake Database.

    The CREATE EXTERNAL TABLE ... syntax you used is from the T-SQL side and won’t update the Spark-based Lake DB catalog, which is why you don’t see it in the workspace.

    The table you create this way will appear in the Synapse Studio > Data > Lake databases > testdb section.

    If you want additional benefits like ACID transactions, consider writing your table as a Delta Table instead of pure Parquet:

    df = spark.read.parquet(parquet_path)
    df.write.format("delta").save("abfss://.../data/deltatable")
    
    
    spark.sql(f"""
        CREATE TABLE deltatable
        USING DELTA
        LOCATION 'abfss://.../data/deltatable'
    """)
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.