Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: Databricks SQL
Databricks Runtime
For rules governing how conflicts between data types are resolved, see SQL data type rules.
Supported data types
Azure Databricks supports the following data types:
Data Type | Description |
---|---|
BIGINT | Represents 8-byte signed integer numbers. |
BINARY | Represents byte sequence values. |
BOOLEAN | Represents Boolean values. |
DATE | Represents values comprising values of fields year, month and day, without a time-zone. |
DECIMAL(p,s) | Represents numbers with maximum precision p and fixed scale s . |
DOUBLE | Represents 8-byte double-precision floating point numbers. |
FLOAT | Represents 4-byte single-precision floating point numbers. |
INT | Represents 4-byte signed integer numbers. |
INTERVAL intervalQualifier | Represents intervals of time either on a scale of seconds or months. |
VOID | Represents the untyped NULL. |
SMALLINT | Represents 2-byte signed integer numbers. |
STRING | Represents character string values. |
TIMESTAMP | Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone. |
TIMESTAMP_NTZ | Represents values comprising values of fields year, month, day, hour, minute, and second. All operations are performed without taking any time zone into account. |
TINYINT | Represents 1-byte signed integer numbers. |
GEOGRAPHY(srid) | Represents geography values whose coordinate reference system is geographic (longitude and latitude in degrees) and is defined by the srid value. If srid is set to ANY the coordinate reference system is not hardcoded in the type and becomes a runtime value. |
GEOMETRY(srid) | Represents geometry values whose coordinate reference system is understood as Cartsian and is defined by the srid value. If srid is set to ANY the coordinate reference system is not hardcoded in the type and becomes a runtime value. |
ARRAY < elementType > | Represents values comprising a sequence of elements with the type of elementType . |
MAP < keyType,valueType > | Represents values comprising a set of key-value pairs. |
STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] > | Represents values with the structure described by a sequence of fields. |
VARIANT | Represents semi-structured data. |
OBJECT | Represents values in a VARIANT with the structure described by a set of fields. |
Important
Delta Lake does not support the VOID
type.
Data type classification
Data types are grouped into the following classes:
Integral numeric types
Integral numeric types represent whole numbers:
Exact numeric types
Exact numeric types represent base-10 numbers:
Binary floating point types
Binary floating point types use exponents and a binary representation to cover a large range of numbers:
Numeric types
Numeric types represent all numeric data types:
Date-time types
Date-time types represent date and time components:
Geospatial types
Geospatial types represent geometric or geographic objects:
Simple types
Simple types are types defined by holding singleton values:
Complex types
Complex types are composed of multiple components of complex or simple types:
Language mappings
Applies to: Databricks Runtime
Scala
Spark SQL data types are defined in the package org.apache.spark.sql.types
. You access them by importing the package:
import org.apache.spark.sql.types._
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | Byte | ByteType |
SMALLINT | ShortType | Short | ShortType |
INT | IntegerType | Int | IntegerType |
BIGINT | LongType | Long | LongType |
FLOAT | FloatType | Float | FloatType |
DOUBLE | DoubleType | Double | DoubleType |
DECIMAL(p,s) | DecimalType | java.math.BigDecimal | DecimalType |
STRING | StringType | String | StringType |
BINARY | BinaryType | Array[Byte] | BinaryType |
BOOLEAN | BooleanType | Boolean | BooleanType |
TIMESTAMP | TimestampType | java.sql.Timestamp | TimestampType |
TIMESTAMP_NTZ | TimestampNTZType | java.time.LocalDateTime | TimestampNTZType |
DATE | DateType | java.sql.Date | DateType |
year-month interval | YearMonthIntervalType | java.time.Period | YearMonthIntervalType (3) |
day-time interval | DayTimeIntervalType | java.time.Duration | DayTimeIntervalType (3) |
GEOGRAPHY(srid) | GeographyType | org.apache.spark.unsafe.type.GeographyVal | GeographyType |
GEOMETRY(srid) | GeometryType | org.apache.spark.unsafe.type.GeometryVal | GeometryType |
ARRAY | ArrayType | scala.collection.Seq | ArrayType(elementType [, containsNull]). (2) |
MAP | MapType | scala.collection.Map | MapType(keyType, valueType [, valueContainsNull]). (2) |
STRUCT | StructType | org.apache.spark.sql.Row | StructType(fields). fields is a Seq of StructField. 4. |
StructField | The value type of the data type of this field(For example, Int for a StructField with the data type IntegerType) | StructField(name, dataType [, nullable]). 4 | |
VARIANT | VariantType | org.apache.spark.unsafe.type.VariantVal | VariantType |
OBJECT | Not Supported | Not supported | Not supported |
Java
Spark SQL data types are defined in the package org.apache.spark.sql.types
. To access or create a data type, use factory methods provided in org.apache.spark.sql.types.DataTypes
.
SQL type | Data Type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | byte or Byte | DataTypes.ByteType |
SMALLINT | ShortType | short or Short | DataTypes.ShortType |
INT | IntegerType | int or Integer | DataTypes.IntegerType |
BIGINT | LongType | long or Long | DataTypes.LongType |
FLOAT | FloatType | float or Float | DataTypes.FloatType |
DOUBLE | DoubleType | double or Double | DataTypes.DoubleType |
DECIMAL(p,s) | DecimalType | java.math.BigDecimal | DataTypes.createDecimalType() DataTypes.createDecimalType(precision, scale). |
STRING | StringType | String | DataTypes.StringType |
BINARY | BinaryType | byte[] | DataTypes.BinaryType |
BOOLEAN | BooleanType | boolean or Boolean | DataTypes.BooleanType |
TIMESTAMP | TimestampType | java.sql.Timestamp | DataTypes.TimestampType |
TIMESTAMP_NTZ | TimestampNTZType | java.time.LocalDateTime | DataTypes.TimestampNTZType |
DATE | DateType | java.sql.Date | DataTypes.DateType |
year-month interval | YearMonthIntervalType | java.time.Period | YearMonthIntervalType (3) |
day-time interval | DayTimeIntervalType | java.time.Duration | DayTimeIntervalType (3) |
GEOGRAPHY(srid) | GeographyType | org.apache.spark.unsafe.type.GeographyVal | GeographyType |
GEOMETRY(srid) | GeometryType | org.apache.spark.unsafe.type.GeometryVal | GeometryType |
ARRAY | ArrayType | ava.util.List | DataTypes.createArrayType(elementType [, containsNull]).(2) |
MAP | MapType | java.util.Map | DataTypes.createMapType(keyType, valueType [, valueContainsNull]).(2) |
STRUCT | StructType | org.apache.spark.sql.Row | DataTypes.createStructType(fields). fields is a List or array of StructField. 4 |
StructField | The value type of the data type of this field (For example, int for a StructField with the data type IntegerType) | DataTypes.createStructField(name, dataType, nullable) 4 | |
VARIANT | VariantType | org.apache.spark.unsafe.type.VariantVal | VariantType |
OBJECT | Not Supported | Not supported | Not supported |
Python
Spark SQL data types are defined in the package pyspark.sql.types
. You access them by importing the package:
from pyspark.sql.types import *
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | int or long. (1) | ByteType() |
SMALLINT | ShortType | int or long. (1) | ShortType() |
INT | IntegerType | int or long | IntegerType() |
BIGINT | LongType | long (1) | LongType() |
FLOAT | FloatType | float (1) | FloatType() |
DOUBLE | DoubleType | float | DoubleType() |
DECIMAL(p,s) | DecimalType | decimal.Decimal | DecimalType() |
STRING | StringType | string | StringType() |
BINARY | BinaryType | bytearray | BinaryType() |
BOOLEAN | BooleanType | bool | BooleanType() |
TIMESTAMP | TimestampType | datetime.datetime | TimestampType() |
TIMESTAMP_NTZ | TimestampNTZType | datetime.datetime | TimestampNTZType() |
DATE | DateType | datetime.date | DateType() |
year-month interval | YearMonthIntervalType | Not supported | Not supported |
day-time interval | DayTimeIntervalType | datetime.timedelta | DayTimeIntervalType (3) |
GEOGRAPHY(srid) | GeographyType | GeographyVal | GeographyType() |
GEOMETRY(srid) | GeometryType | GeometryVal | GeometryType() |
ARRAY | ArrayType | list, tuple, or array | ArrayType(elementType, [containsNull]).(2) |
MAP | MapType | dict | MapType(keyType, valueType, [valueContainsNull]).(2) |
STRUCT | StructType | list or tuple | StructType(fields). field is a Seq of StructField. (4) |
StructField | The value type of the data type of this field (For example, Int for a StructField with the data type IntegerType) | StructField(name, dataType, [nullable]).(4) | |
VARIANT | VariantType | VariantVal | VariantType() |
OBJECT | Not Supported | Not supported | Not supported |
R
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | integer (1) | 'byte' |
SMALLINT | ShortType | integer (1) | 'short' |
INT | IntegerType | integer | 'integer' |
BIGINT | LongType | integer (1) | 'long' |
FLOAT | FloatType | numeric (1) | 'float' |
DOUBLE | DoubleType | numeric | 'double' |
DECIMAL(p,s) | DecimalType | Not supported | Not supported |
STRING | StringType | character | 'string' |
BINARY | BinaryType | raw | 'binary' |
BOOLEAN | BooleanType | logical | 'bool' |
TIMESTAMP | TimestampType | POSIXct | 'timestamp' |
TIMESTAMP_NTZ | TimestampNTZType | datetime.datetime | TimestampNTZType() |
DATE | DateType | Date | 'date' |
year-month interval | YearMonthIntervalType | Not supported | Not supported |
day-time interval | DayTimeIntervalType | Not supported | Not supported |
GEOGRAPHY(srid) | Not supported | Not supported | Not supported |
GEOMETRY(srid) | Not supported | Not supported | Not supported |
ARRAY | ArrayType | vector or list | list(type='array', elementType=elementType, containsNull=[containsNull]).(2) |
MAP | MapType | environment | list(type='map', keyType=keyType, valueType=valueType, valueContainsNull=[valueContainsNull]).(2) |
STRUCT | StructType | named list | list(type='struct', fields=fields). fields is a Seq of StructField. (4) |
StructField | The value type of the data type of this field (For example, integer for a StructField with the data type IntegerType) | list(name=name, type=dataType, nullable=[nullable]).(4) | |
VARIANT | Not Supported | Not supported | Not supported |
OBJECT | Not Supported | Not supported | Not supported |
(1) Numbers are converted to the domain at runtime. Make sure that numbers are within range.
(2) The optional value defaults to TRUE
.
(3) Interval types
YearMonthIntervalType([startField,] endField)
: Represents a year-month interval which is made up of a contiguous subset of the following fields:startField
is the leftmost field, andendField
is the rightmost field of the type. Valid values ofstartField
andendField
are0(MONTH)
and1(YEAR)
.DayTimeIntervalType([startField,] endField)
: Represents a day-time interval which is made up of a contiguous subset of the following fields:startField
is the leftmost field, andendField
is the rightmost field of the type. Valid values ofstartField
andendField
are0(DAY)
,1(HOUR)
,2(MINUTE)
,3(SECOND)
.
(4) StructType
StructType(fields)
Represents values with the structure described by a sequence, list, or array ofStructField
s (fields). Two fields with the same name are not allowed.StructField(name, dataType, nullable)
Represents a field in aStructType
. The name of a field is indicated byname
. The data type of a field is indicated by dataType.nullable
indicates if values of these fields can havenull
values. This is the default.