Compute configuration for Databricks Connect

Note

This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.

This page describes different ways of configuring a connection between Databricks Connect and your Azure Databricks cluster or serverless compute.

Databricks Connect enables you to connect popular IDEs such as Visual Studio Code, PyCharm, RStudio Desktop, IntelliJ IDEA, notebook servers, and other custom applications to Azure Databricks clusters. See Databricks Connect.

Setup

Before you begin, you need the following:

Databricks Connect installed. For installation requirements, see Databricks Connect usage requirements.
The Azure Databricks workspace instance name. This is the Server Hostname value for your compute. See Get connection details for an Azure Databricks compute resource.
If you are connecting to classic compute, the ID of your cluster. You can retrieve the cluster ID from the URL. See Compute resource URL and ID.

Configure a connection to a cluster

There are multiple ways to configure the connection to your cluster. Databricks Connect searches for configuration properties in the following order, and uses the first configuration it finds. For advanced configuration information, see Advanced usage of Databricks Connect.

The DatabricksSession class's remote() method.
A Databricks configuration profile
The DATABRICKS_CONFIG_PROFILE environment variable
An environment variable for each configuration property
A Databricks configuration profile named DEFAULT

The `DatabricksSession` class's `remote()` method

For this option, which applies to Authenticate with Azure Databricks personal access tokens (legacy) only, specify the workspace instance name, the Azure Databricks personal access token, and the ID of the cluster.

You can initialize the DatabricksSession class in several ways:

Set the host, token, and cluster_id fields in DatabricksSession.builder.remote().
Use the Databricks SDK's Config class.
Specify a Databricks configuration profile along with the cluster_id field.

Instead of specifying these connection properties in your code, Databricks recommends configuring properties through environment variables or configuration files, as described throughout this section. The following code examples assume that you provide some implementation of the proposed retrieve_* functions to get the necessary properties from the user or from some other configuration store, such as Azure KeyVault.

The code for each of these approaches is as follows:

Python

# Set the host, token, and cluster_id fields in DatabricksSession.builder.remote.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.remote(
host       = f"https://{retrieve_workspace_instance_name()}",
token      = retrieve_token(),
cluster_id = retrieve_cluster_id()
).getOrCreate()

Scala

// Set the host, token, and clusterId fields in DatabricksSession.builder.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder()
    .host(retrieveWorkspaceInstanceName())
    .token(retrieveToken())
    .clusterId(retrieveClusterId())
    .getOrCreate()

Python

# Use the Databricks SDK's Config class.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
host       = f"https://{retrieve_workspace_instance_name()}",
token      = retrieve_token(),
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

Scala

// Use the Databricks SDK's Config class.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setHost(retrieveWorkspaceInstanceName())
    .setToken(retrieveToken())
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

Python

# Specify a Databricks configuration profile along with the `cluster_id` field.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
profile    = "<profile-name>",
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

Scala

// Specify a Databricks configuration profile along with the clusterId field.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

A Databricks configuration profile

For this option, create or identify an Azure Databricks configuration profile containing the field cluster_id and any other fields that are necessary for the Databricks authentication type that you want to use.

The required configuration profile fields for each authentication type are as follows:

For Azure Databricks personal access token authentication: host and token.
For OAuth machine-to-machine (M2M) authentication (where supported): host, client_id, and client_secret.
For OAuth user-to-machine (U2M) authentication (where supported): host.
For Microsoft Entra ID (formerly Azure Active Directory) service principal authentication: host, azure_tenant_id, azure_client_id, azure_client_secret, and possibly azure_workspace_resource_id.
For Azure CLI authentication: host.
For Azure managed identities authentication (where supported): host, azure_use_msi, azure_client_id, and possibly azure_workspace_resource_id.

Then set the name of this configuration profile through the configuration class.

You can specify cluster_id in a couple of ways:

Include the cluster_id field in your configuration profile, and then just specify the configuration profile's name.
Specify the configuration profile name along with the cluster_id field.

If you have already set the DATABRICKS_CLUSTER_ID environment variable with the cluster's ID, you do not also need to specify cluster_id.

The code for each of these approaches is as follows:

Python

# Include the cluster_id field in your configuration profile, and then
# just specify the configuration profile's name:
from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.profile("<profile-name>").getOrCreate()

Scala

// Include the cluster_id field in your configuration profile, and then
// just specify the configuration profile's name:
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
    val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .getOrCreate()

Python

# Specify the configuration profile name along with the cluster_id field.
# In this example, retrieve_cluster_id() assumes some custom implementation that
# you provide to get the cluster ID from the user or from some other
# configuration store:
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
profile    = "<profile-name>",
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

Scala

// Specify a Databricks configuration profile along with the clusterId field.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

The `DATABRICKS_CONFIG_PROFILE` environment variable

If you have already set the DATABRICKS_CLUSTER_ID environment variable with the cluster's ID, you do not also need to specify cluster_id.

The required configuration profile fields for each authentication type are as follows:

For Azure Databricks personal access token authentication: host and token.
For OAuth machine-to-machine (M2M) authentication (where supported): host, client_id, and client_secret.
For OAuth user-to-machine (U2M) authentication (where supported): host.
For Microsoft Entra ID (formerly Azure Active Directory) service principal authentication: host, azure_tenant_id, azure_client_id, azure_client_secret, and possibly azure_workspace_resource_id.
For Azure CLI authentication: host.
For Azure managed identities authentication (where supported): host, azure_use_msi, azure_client_id, and possibly azure_workspace_resource_id.

Set the DATABRICKS_CONFIG_PROFILE environment variable to the name of this configuration profile. Then initialize the DatabricksSession class:

Python

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

An environment variable for each configuration property

For this option, set the DATABRICKS_CLUSTER_ID environment variable and any other environment variables that are necessary for the Databricks authentication type that you want to use.

The required environment variables for each authentication type are as follows:

For Azure Databricks personal access token authentication: DATABRICKS_HOST and DATABRICKS_TOKEN.
For OAuth machine-to-machine (M2M) authentication (where supported): DATABRICKS_HOST, DATABRICKS_CLIENT_ID, and DATABRICKS_CLIENT_SECRET.
For OAuth user-to-machine (U2M) authentication (where supported): DATABRICKS_HOST.
For Microsoft Entra ID (formerly Azure Active Directory) service principal authentication: DATABRICKS_HOST, ARM_TENANT_ID, ARM_CLIENT_ID, ARM_CLIENT_SECRET, and possibly DATABRICKS_AZURE_RESOURCE_ID.
For Azure CLI authentication: DATABRICKS_HOST.
For Azure managed identities authentication (where supported): DATABRICKS_HOST, ARM_USE_MSI, ARM_CLIENT_ID, and possibly DATABRICKS_AZURE_RESOURCE_ID.

Then initialize the DatabricksSession class:

Python

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

A Databricks configuration profile named `DEFAULT`

If you have already set the DATABRICKS_CLUSTER_ID environment variable with the cluster's ID, you do not also need to specify cluster_id.

The required configuration profile fields for each authentication type are as follows:

For Azure Databricks personal access token authentication: host and token.
For OAuth machine-to-machine (M2M) authentication (where supported): host, client_id, and client_secret.
For OAuth user-to-machine (U2M) authentication (where supported): host.
For Microsoft Entra ID (formerly Azure Active Directory) service principal authentication: host, azure_tenant_id, azure_client_id, azure_client_secret, and possibly azure_workspace_resource_id.
For Azure CLI authentication: host.
For Azure managed identities authentication (where supported): host, azure_use_msi, azure_client_id, and possibly azure_workspace_resource_id.

Name this configuration profile DEFAULT.

Then initialize the DatabricksSession class:

Python

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

Configure a connection to serverless compute

Databricks Connect for Python and Scala support connecting to serverless compute. To use this feature, version requirements for connecting to serverless must be met. See Databricks Connect usage requirements.

Important

This feature has the following limitations:

Databricks Connect for Scala support for serverless compute is in Beta.
The Databricks Connect version and the version of Python or Scala must be compatible. See Databricks Connect versions.
All of the limitations of Databricks Connect for Python or Databricks Connect for Scala.
All of the serverless compute limitations

For Python, you can configure a connection to serverless compute in your local environment:

Set the local environment variable DATABRICKS_SERVERLESS_COMPUTE_ID to auto. If this environment variable is set, Databricks Connect ignores the cluster_id.
In a local Databricks configuration profile, set serverless_compute_id = auto, then reference that profile from your code.
```
[DEFAULT]
host = https://my-workspace.cloud.databricks.com/
serverless_compute_id = auto
token = dapi123...
```

Or for Python or Scala:

Python

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.serverless().getOrCreate()

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.remote(serverless=True).getOrCreate()

Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder.serverless().getOrCreate()

Validate the connection to Databricks

To validate that your environment, default credentials, and connection to compute are correctly set up for Databricks Connect, run the databricks-connect test command:

databricks-connect test

This command fails with a non-zero exit code and a corresponding error message when it detects any incompatibility in the setup, such as when the Databricks Connect version is incompatible with the Databricks serverless compute version. For Databricks Connect version support information, see Databricks Connect versions.

In Databricks Connect 14.3 and above, you can also validate your environment using validateSession():

DatabricksSession.builder.validateSession(True).getOrCreate()

Disabling Databricks Connect

Databricks Connect (and the underlying Spark Connect) services can be disabled on any given cluster.

To disable the Databricks Connect service, set the following Spark configuration on the cluster.

spark.databricks.service.server.enabled false

Feedback

Was this page helpful?

Last updated on 2026-04-03

Compute configuration for Databricks Connect

Setup

Configure a connection to a cluster

The DatabricksSession class's remote() method

Python

Scala

Python

Scala

Python

Scala

A Databricks configuration profile

Python

Scala

Python

Scala

The DATABRICKS_CONFIG_PROFILE environment variable

Python

Scala

An environment variable for each configuration property

Python

Scala

A Databricks configuration profile named DEFAULT

Python

Scala

Configure a connection to serverless compute

Python

Scala

Validate the connection to Databricks

Disabling Databricks Connect

Feedback

Additional resources

The `DatabricksSession` class's `remote()` method

The `DATABRICKS_CONFIG_PROFILE` environment variable

A Databricks configuration profile named `DEFAULT`