Intermittent connectivity issues while establishing JDBC connection to azure sql server from Databricks

Akshay Kudale 20 Reputation points
2025-07-24T06:19:54.73+00:00

I’m connecting to Azure SQL Database from Databricks using a JDBC connection authenticated via a Service Principal (Azure AD).

This connection is part of an hourly Databricks workflow, which writes logs and accesses metadata in Azure SQL. In most runs, the connection is successfully established. However, intermittently, I encounter the following error during the authentication step:

update_audit_log function failed with exception: HTTPSConnectionPool(host='login.windows.net', port=443): Max retries exceeded with url: /[REDACTED]/oauth2/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f5111edded0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

Could you please advise on how to permanently address this issue? I

Below are more details regarding connectivity:

Aspect Details
Connection Type JDBC
Connection Type JDBC
Authentication Azure Active Directory (Service Principal) via client credentials grant (OAuth 2.0) using adal
Target Database Azure SQL Database
Used In PySpark, by accessing JVM-based java.sql.DriverManager via spark.sparkContext._gateway
Security Access token (accessToken) is used instead of a username/password
Azure SQL Database
{count} votes

Accepted answer
  1. Pratyush Vashistha 190 Reputation points Microsoft External Staff Moderator
    2025-07-24T07:53:41.79+00:00

    Hello Akshay Kudale!

    Thank you for reaching out to the Microsoft Q&A platform. Happy to answer your question.

    This "Network is unreachable" error during authentication to login.windows.net (Azure AD) from Databricks, especially when using a Service Principal, is a classic sign of intermittent network connectivity or DNS resolution issues from the Databricks cluster's perspective. Since it's intermittent, it points to temporary disruptions rather than a constant blocking.

    What is happening in background is your Databricks cluster needs to reach Azure AD's authentication endpoint (login.windows.net on port 443) to obtain the access token for your Service Principal. The HTTPSConnectionPool: Max retries exceeded error, combined with Network is unreachable, means the Databricks cluster couldn't establish a basic TCP connection to Azure AD to even begin the authentication process.

    This could be due to:

    • Transient Network Glitches: Temporary routing issues or packet loss within Azure's network or between Databricks' control plane and Azure AD.

    • DNS Resolution Issues: Intermittent problems where the Databricks cluster struggles to resolve login.windows.net to its IP address.

    • Outbound Firewall/NSG Rules: Less likely if it's intermittent (a permanent block would be consistent), but worth verifying if some network paths are sometimes blocked.

    How to Permanently Address This Issue

    The solution involves ensuring robust and consistent network outbound connectivity from your Databricks cluster to Azure AD.

    1. Verify Databricks VNet Injection and Network Configuration: Link

    • If your Databricks workspace is deployed with VNet Injection (which is recommended for advanced networking control), the network configuration of the VNet is critical.

    Check NSG Rules: Ensure the Network Security Groups (NSGs) applied to the Databricks subnets (public and private) have outbound rules allowing traffic to Azure Active Directory. This typically means:

    a) Destination: Service Tag AzureActiveDirectory

    b) Destination Port: 443 (HTTPS)

    c) Protocol: TCP

    d) Action: Allow

    e) Ensure the priority of this rule is higher (lower number) than any deny rules that might capture this traffic.

    o User-Defined Routes (UDRs) / Firewall Appliances: If you're routing outbound traffic through a firewall appliance (e.g., Azure Firewall) or using custom UDRs, ensure that login.windows.net (or the AzureActiveDirectory service tag) is explicitly allowed egress through that appliance/route. Intermittent failures can happen if the appliance is under load or has transient issues.

    o For more understanding follow these references:

    1. Databricks documentation on Network Security Group rules: User-defined route settings for Azure Databricks (This is crucial for VNet-injected workspaces).
    2. Azure Service Tags overview (for AzureActiveDirectory): Azure Service Tags overview

    2. DNS Resolution Check:

    o Ensure your Databricks VNet (if injected) is configured with reliable DNS servers. If you're using custom DNS (e.g., Azure DNS private zones or your own DNS servers), verify their stability and ability to resolve public endpoints like login.windows.net consistently. Intermittent DNS resolution failures can lead to "Network unreachable" errors.

    o Action: Test DNS resolution from within a Databricks notebook if possible (e.g., using a %sh dig login.windows.net command to see resolution times and success rates).

    o For more understanding follows these references:

    1. Azure DNS overview (for VNet DNS settings): What is Azure DNS?

    3. Implement Retry Logic in your PySpark Code (Workaround/Mitigation):

    o While the above steps address the root cause, for highly critical workflows, implementing retry logic with exponential backoff around your update_audit_log function (or the token acquisition part) can make your workflow more resilient to transient network issues. This won't fix the underlying problem but makes your jobs more robust.

    4. Monitor Databricks Cluster Health and Logs:

    o Regularly check Databricks cluster logs and metrics for any signs of network instability or resource contention that might lead to these intermittent issues.

    Given that it's intermittent and specifically affecting login.windows.net, the most likely culprit is an NSG rule or UDR issue that isn't always active, or a transient DNS problem. Start by thoroughly reviewing your Databricks VNet's outbound NSG rules for the AzureActiveDirectory service tag.

    Please check out similar issues [Issue 1](https://learn.microsoft.com/en-us/answers/questions/1065925/error-httpsconnectionpool(host-login-microsoftonli) Issue 2 . The right thing to do would be to ensure all the endpoints are allowed through the firewall.

    Please "Accept as Answer" and Upvote if the answer provided is useful, so that you can help others in the community looking for remediation for similar issues.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.