ADF Azure Databricks Linked Service job cluster using policyId that specifies pool id

James Lee 0 Reputation points
2024-11-22T22:32:26.85+00:00

Hi, how can I set up an Azure Databricks linked service using a job cluster, using a job policy that specifies the driver and worker pool ids?

In the linked service definition, I have selected "New job cluster", which requires me to supply a Cluster Node Type and Cluster Driver Node Type.

I am then using the dynamic json to specify a "policyId" key under typeProperties. As documented here:

When I attempt to use the linked service, I get this error:

Operation on target my_activity failed: The field 'node_type_id' cannot be supplied when an instance pool ID is provided .

I then removed the "newClusterNodeType" and "newClusterDriverNodeType" keys from typeProperties, and received this error:

Operation on target my_activity failed: Databricks LinkedService should specify an existing interactive cluster ID, or an existing instance pool ID, or new cluster information for creation .

I cannot use the "Existing Instance pool" option because I do not have permissions to view the pools for security reasons. I must use the policy ID provided to me.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,240 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,923 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 24,091 Reputation points MVP
    2024-11-23T10:07:57.2233333+00:00

    The issue you're encountering arises because the ADF databricks linked service requires proper alignment between the policyId and the cluster configuration parameters. Here's how you can configure the linked service correctly to use a job policy that specifies the driver and worker pool IDs without encountering the errors:

    Steps to Configure the Linked Service

    Set the policyId Only (No newClusterNodeType or newClusterDriverNodeType) A job policy in Databricks defines the allowed configurations for a job cluster. When you specify a policyId, the policy enforces configurations such as driver and worker node types, instance pool IDs, and other cluster settings. As a result, specifying additional properties like node_type_id alongside policyId causes conflicts.

    In your case:

    • Remove newClusterNodeType and newClusterDriverNodeType entirely from the typeProperties JSON.
      • Ensure that the policy associated with the policyId includes instance pool settings for both the driver and worker nodes.
    1. Dynamic JSON Example Here's how your typeProperties section in the linked service should look when using a policyId:
         {
             "type": "AzureDatabricks",
             "typeProperties": {
                 "domain": "https://<databricks-instance>.azuredatabricks.net",
                 "accessToken": "<your-databricks-access-token>",
                 "policyId": "<your-policy-id>",
                 "newClusterVersion": "<runtime-version>", // Specify the Databricks runtime version
                 "newClusterNumOfWorker": "<number-of-workers>"
             }
         }
         
         
      
    2. Ensure Policy Compliance Verify with your Databricks administrator that:
      • The specified policyId includes instance pool settings for the driver and worker nodes.
      • The policy allows for flexibility in setting the number of workers and runtime version, as these may need to be specified dynamically in your ADF pipeline.
    3. Permissions Check If you do not have visibility into instance pools for security reasons, you can still use the policyId to enforce instance pool usage. However, ensure the policy includes these settings:
      • "instancePoolId" for both driver and worker nodes.
      • Any constraints that restrict or specify acceptable configurations.
    4. Test Configuration After updating the linked service, test the configuration by triggering a minimal pipeline that uses the linked service. If the error persists, ask your Databricks admin to verify the compatibility of the job policy with the ADF pipeline requirement
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.