Events
Apr 8, 3 PM - May 28, 7 AM
Sharpen your AI skills and enter the sweepstakes to win a free Certification exam
Register now!This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this tutorial, you'll learn how to analyze data with serverless SQL pool.
Serverless SQL pools let you use SQL without having to reserve capacity. Billing for a serverless SQL pool is based on the amount of data processed to run the query and not the number of nodes used to run the query.
Every workspace comes with a preconfigured serverless SQL pool called Built-in.
Note
Make sure you have placed the sample data into the primary storage account
In the Synapse Studio, go to the Develop hub
Create a new SQL script.
Paste the following code into the script. (Update contosolake
to the name of your storage account and users
with the name of your container.)
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet',
FORMAT='PARQUET'
) AS [result]
Select Run.
Data exploration is just a simplified scenario where you can understand the basic characteristics of your data. Learn more about data exploration and analysis in this tutorial.
You can browse the content of the files directly via master
database. For some simple data exploration scenarios, you don't need to create a separate database.
However, as you continue data exploration, you might want to create some utility objects, such as:
Use the master
database to create a separate database for custom database objects. Custom database objects can't be created in the master
database.
CREATE DATABASE DataExplorationDB
COLLATE Latin1_General_100_BIN2_UTF8
Important
Use a collation with _UTF8
suffix to ensure that UTF-8 text is properly converted to VARCHAR
columns. Latin1_General_100_BIN2_UTF8
provides the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers. For more information on changing collations, see Collation types supported for Synapse SQL.
Switch the database context from master
to DataExplorationDB
using the following command. You can also use the UI control use database to switch your current database:
USE DataExplorationDB
From DataExplorationDB
create utility objects such as credentials and data sources.
CREATE EXTERNAL DATA SOURCE ContosoLake
WITH ( LOCATION = 'https://contosolake.dfs.core.windows.net')
Note
An external data source can be created without a credential. If a credential does not exist, the caller's identity will be used to access the external data source.
Optionally, use the newly created DataExplorationDB
database to create a login for a user in DataExplorationDB
that will access external data:
CREATE LOGIN data_explorer WITH PASSWORD = 'My Very Strong Password 1234!';
Next create a database user in DataExplorationDB
for the above login and grant the ADMINISTER DATABASE BULK OPERATIONS
permission.
CREATE USER data_explorer FOR LOGIN data_explorer;
GO
GRANT ADMINISTER DATABASE BULK OPERATIONS TO data_explorer;
GO
Explore the content of the file using the relative path and the data source:
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK '/users/NYCTripSmall.parquet',
DATA_SOURCE = 'ContosoLake',
FORMAT='PARQUET'
) AS [result]
Publish your changes to the workspace.
Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about building a logical data warehouse in this tutorial.
Events
Apr 8, 3 PM - May 28, 7 AM
Sharpen your AI skills and enter the sweepstakes to win a free Certification exam
Register now!