Edit

Share via


Configure Google BigQuery in a copy activity

This article outlines how to use the copy activity in a pipeline to copy data from and to Google BigQuery.

Supported configuration

For the configuration of each tab under copy activity, go to the following sections respectively.

General

Refer to the General settings guidance to configure the General settings tab.

Source

The following properties are supported for Google BigQuery under the Source tab of a copy activity.

The following properties are required:

  • Connection: Select a Google BigQuery connection from the connection list. If no connection exists, then create a new Google BigQuery connection.

  • Use query: Select from Table or GoogleSQL.

    • If you select Table:

      • Table: Specify the name of the Google BigQuery table. Select the table from the drop-down list or select Edit to manually enter it.

        Screenshot showing Use query - Table.

    • If you select GoogleSQL:

      • GoogleSQL: Use the custom SQL query to read data. An example is SELECT * FROM MyTable. For more information, go to Query syntax.

        Screenshot showing Use query - Google SQL.

Under Advanced, you can specify the following fields:

  • Additional columns: Add additional data columns to store source files' relative path or static value. Expression is supported for the latter.

Destination (Preview)

The following properties are supported for Google BigQuery under the Destination tab of a copy activity.

The following properties are required:

  • Connection: Select a Google BigQuery connection from the connection list. If no connection exists, then create a new Google BigQuery connection.

  • Table: Specify the name of the Google BigQuery table. Select the table from the drop-down list or select Edit to manually enter it.

Under Advanced, you can specify the following fields:

  • Pre-copy script: Specify a script using GoogleSQL for copy activity to execute before writing data into destination table in each run. You can use this to clean up the pre-loaded data.

  • Write batch timeout: Specify the wait time for the batch insert operation to finish before it times out. The allowed value is timespan. The default value is 00:30:00 (30 minutes).

  • Write batch size: Specify the number of rows to insert into the Google BigQuery table per batch. The allowed value is integer (number of rows). The default value is 10000.

Mapping

For Mapping tab configuration, see Configure your mappings under mapping tab.

Settings

For Settings tab configuration, go to Configure your other settings under settings tab.

Data type mapping for Google BigQuery

When copying data from Google BigQuery, the following mappings are used from Google BigQuery data types to interim data types used by the service internally.

Google BigQuery data type Interim data type
JSON String
STRING String
BYTES Byte array
INTEGER Int64
FLOAT Double
NUMERIC Decimal
BIGNUMERIC String
BOOLEAN Boolean
TIMESTAMP DateTimeOffset
DATE DateTime
TIME TimeSpan
DATETIME DateTimeOffset
GEOGRAPHY String
RECORD/STRUCT String
ARRAY String

When copying data to Google BigQuery, the following mappings are used from interim data types used by the service internally to Google BigQuery data types.

Interim data type Google BigQuery data type
Boolean BOOLEAN
Byte array BYTES
Date DATE
DateTime DATETIME
DateTimeOffset TIMESTAMP
Decimal NUMERIC
Float-point FLOAT
GUID STRING
Integer INTEGER
String BIGNUMERIC, STRING, JSON, GEOGRAPHY, RECORD/STRUCT, ARRAY, RANGE
Time TIME
TimeSpan INTEGER

Table summary

The following tables contain more information about the copy activity in Google BigQuery.

Source information

Name Description Value Required JSON script property
Connection Your connection to the source data store. < your Google BigQuery connection > Yes connection
Connection type Your connection type. Select Google BigQuery. Google BigQuery Yes /
Use query The way to read data. Apply Table to read data from the specified table or apply GoogleSQL to read data using SQL queries. Table
GoogleSQL
Yes /
For Table
dataset name Name of the Google BigQuery dataset. < your dataset name > No (if GoogleSQL is specified) dataset
table name Name of the table. < your table name > No (if Google SQL is specified) table
For GoogleSQL
GoogleSQL Use the custom SQL query to read data. An example is SELECT * FROM MyTable. For more information, go to Query syntax. < GoogleSQL query > No (if Table is specified) query
Additional columns Add additional data columns to store source files' relative path or static value. Expression is supported for the latter. • Name
• Value
No additionalColumns:
• name
• value

Destination information (Preview)

Name Description Value Required JSON script property
Connection Your connection to the destination data store. < your Google BigQuery connection > Yes connection
dataset name Name of the Google BigQuery dataset. < your dataset name > Yes dataset
table name Name of the table. < your table name > Yes table
Pre-copy script A GoogleSQL for copy activity to execute before writing data into destination table in each run. You can use this to clean up the pre-loaded data. < GoogleSQL query > No preCopyScript
Write behavior The write behavior for copy activity to load data into Google BigQuery. insert (default) No writeBehavior
Write batch timeout Wait time for the batch insert operation to complete before it times out. Allowed values are timespan. < timespan >
00:30:00 (default)
No writeBatchTimeout
Write batch size The number of rows to insert into the Google BigQuery table per batch. The allowed value is integer (number of rows). < integer >
10000 (default)
No writeBatchSize