Intermittent "TCP Provider, error: 35 – Connection reset by peer" on Azure Function — connection opens, no command ever received by SQL Server

Question

Intermittent "TCP Provider, error: 35 – Connection reset by peer" on Azure Function — connection opens, no command ever received by SQL Server

alexbf 1

Platform: Azure Functions (Isolated Worker, .NET 8), Microsoft.Data.SqlClient, SQL Server (on-prem, accessed over private VNET)

Trigger: Timer-triggered Function (TimerTrigger)

Symptom

An Azure Function fails intermittently with:

Microsoft.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server.

(provider: TCP Provider, error: 35 - An internal exception was caught)

Inner: Connection reset by peer

The error occurs at the client side after approximately 40 seconds

SQL Server monitoring (sys.dm_exec_sessions) shows a session was opened for that connection, but no command was ever received
The problem is intermittent: the same code succeeds most of the time and fails occasionally
Restarting the Function App can both trigger and fix the problem
The error is caught by a retry loop (up to 4 attempts with exponential backoff). It only surfaces as a visible failure when all 4 attempts fail

What has been ruled out

Network / SQL Server side: A health check Function on the same app hits the same database every 5 minutes and has never failed. Same connection string, same server. So the server and network path are not systematically broken.
Connection pooling: Tested with both Pooling=True and Pooling=False. No change in behavior.
Singleton/static connection: All repositories are registered as Scoped. SqlConnection is always a local await using var — never a field.
Sync-over-async deadlock: No .Result or .Wait() anywhere in the execution path.

What the failing Function does (simplified)

The Function orchestrates a multi-page sync from an external HTTP API into SQL Server. For each page of ~100 products, it splits them into sub-batches of 50 and calls a repository method that opens one connection and executes one stored proc per row in a loop:

// Called up to 4x by ExecuteBatchWithRetry on transient errors
public async Task<BulkUpsertResult> UpsertBulkAsync(
    List<InventoryLookupModels> items,
    CancellationToken ct)
{
    var mapped    = MapInventory(items);
    var sanitized = SanitizeInventory(mapped);

    await using var conn = new SqlConnection(_connectionString);
    await conn.OpenAsync(ct);

    foreach (var item in sanitized)
    {
        ct.ThrowIfCancellationRequested();

        await using var cmd = new SqlCommand("dbo.usp_Upsert_Single", conn)
        {
            CommandType    = CommandType.StoredProcedure,
            CommandTimeout = 60
        };

        AddParameters(cmd, item);

        cmd.Parameters.Add(new SqlParameter("@Inserted", SqlDbType.Int) { Direction = ParameterDirection.Output });
        cmd.Parameters.Add(new SqlParameter("@Updated",  SqlDbType.Int) { Direction = ParameterDirection.Output });

        await cmd.ExecuteNonQueryAsync(ct);
        // accumulate results...
    }

    return new BulkUpsertResult { ... };
}

The retry wrapper catches network errors and retries silently:

private async Task<BulkUpsertResult> ExecuteBatchWithRetry(List<InventoryLookupModels> batch)
{
    const int maxRetry = 4;
    for (int attempt = 1; attempt <= maxRetry; attempt++)
    {
        try
        {
            return await _repo.UpsertBulkAsync(batch, CancellationToken.None);
        }
        catch (OperationCanceledException)
        {
            lastException = new TimeoutException("SQL batch timed out");
        }
        catch (Exception ex) when (attempt < maxRetry)
        {
            var isNetworkRelated =
                ex is IOException || ex is SocketException || ex is TimeoutException ||
                ex.Message.Contains("transport-level",  StringComparison.OrdinalIgnoreCase) ||
                ex.Message.Contains("reset by peer",    StringComparison.OrdinalIgnoreCase);

            if (!isNetworkRelated && !isTransientSql) throw;

            var delay = TimeSpan.FromSeconds(Math.Pow(2, attempt))
                      + TimeSpan.FromMilliseconds(Random.Shared.Next(500, 2000));
            await Task.Delay(delay);
        }
    }
    throw lastException!;
}

What the working health check does

await using var conn = new SqlConnection(connectionString);
await conn.OpenAsync();
await new SqlCommand("SELECT 1", conn).ExecuteScalarAsync();

One query, milliseconds, never fails. Same connection string.

Connection string (sanitized)

Server=x.x.x.x;Database=MyDB;User Id=svc_user;Password=***;
Encrypt=True;TrustServerCertificate=True;
Pooling=False;
Connection Timeout=60;
ConnectRetryCount=3;ConnectRetryInterval=10;
MultipleActiveResultSets=False;
Packet Size=8192;

The specific mystery

SQL Server sees a login/session established. No command arrives. ~40 seconds later the client receives "reset by peer."

There are no awaits between conn.OpenAsync() and the first cmd.ExecuteNonQueryAsync() for the first iteration of the loop — only ~32 synchronous parameter assignments taking microseconds. So the 40-second gap is not happening in user code between Open and Execute.

Questions:

Does Microsoft.Data.SqlClient open the TCP connection before OpenAsync() returns, or can a physical RST during the login phase (after TCP connect) explain what SQL Server sees?
Would ConnectRetryCount=3; ConnectRetryInterval=10 cause SqlClient to open a new physical TCP connection on each retry, making SQL Server observe multiple transient sessions?
Is there anything else in this setup that could explain the issue? Thread pool starvation? Something in the Azure Functions host?
Any experience with this specific error (TCP error 35) in Azure Functions with private VNET SQL Server?Platform: Azure Functions (Isolated Worker, .NET 8), Microsoft.Data.SqlClient, SQL Server (another vnet (peered), accessed over private VNET) Trigger: Timer-triggered Function (TimerTrigger)

0 comments

2 answers

Your answer

Answer 1

Hi @alexbf ,

One thing you may want to try is simplifying the SQL connection string and explicitly using the TCP format, as this has resolved similar intermittent connectivity issues in some environments.

You could test changing from:

Server=<myserver>,1433;Initial Catalog=<myDatabase>;Persist Security Info=False;User ID=<MyId>;Password=<Password>;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=True;Packet Size=32767

to:

Server=tcp:<myserver>;Initial Catalog=<myDatabase>;User ID=<MyId>;Password=<Password>;Packet Size=32767;TrustServerCertificate=True;

In some cases, this change can influence how the client establishes and negotiates the connection (especially over VNET/private routing paths), which may help avoid intermittent transport-level resets.

Does Microsoft.Data.SqlClient open TCP before OpenAsync() returns, and can RST explain SQL Server seeing a session?

OpenAsync() does not return until the full connection is established, which includes TCP handshake, TLS handshake (since Encrypt=True), SQL Server login (TDS pre-login + login phase)

So, if SQL Server shows a session in sys.dm_exec_sessions, it means the connection successfully completed login.

A physical TCP RST can still occur immediately after this point (or during early post-login), which would result in:

SQL Server seeing a session briefly created

No subsequent command arriving

Client reporting “connection reset by peer”

So yes, this pattern is consistent with a post-login abort/reset rather than a failed OpenAsync().

Does ConnectRetryCount=3 create new TCP connections per retry and multiple sessions in SQL Server?

Yes, but only during the connection establishment phase inside OpenAsync().

If a transient failure occurs during login:

SqlClient will retry by creating a new physical TCP connection each attempt

SQL Server will see multiple short-lived login sessions for failed attempts

Only the final successful attempt results in a stable session

However, in your scenario (where a session exists but no command arrives), this does not align with ConnectRetry behavior, because retries happen before the connection is handed to application code.

Could anything else explain this (thread pool starvation, Azure Functions host behavior, etc.)?

Yes, and these are actually more likely than SQL Client internals:

Azure Functions / Host-level factors:

Worker process restart or recycle (common in isolated .NET worker)

Timer trigger execution being interrupted

Scale-in/scale-out events

Memory pressure leading to host shutdown

App domain / worker restart mid-execution

These can result in SQL connection being established successfully and then abruptly aborted before first command execution

Thread pool starvation (secondary possibility):

If the .NET thread pool is saturated, continuation after OpenAsync() or before ExecuteNonQueryAsync() can be delayed
This delay can indirectly cause network intermediaries to reset the connection due to inactivity or perceived stall

Any known experience with TCP error 35 in Azure Functions + VNET SQL Server?

This error pattern is commonly observed in Azure Functions (especially timer-triggered or burst workloads), Private VNET / peered VNET SQL Server connectivity, Long-lived connections with sequential command execution

Typical root causes reported include:

Function host recycling or cold start interruptions

Network device / firewall / NAT idle or inspection timeouts

Intermittent TCP resets in VNET routing paths

Thread pool pressure under burst execution leading to delayed SQL packet send

Importantly, this is usually not a SQL Server issue, since the server-side session exists and is ready but never receives the first request payload.

Pravallika KV 16,365 Reputation points Microsoft External Staff Moderator

2026-05-25T15:07:04.8933333+00:00

@alexbf ,Following up to see if the provided answer was helpful. If this answers your query, do click Accept Answer =>Yes, and upvote it. If you have any further queries do let us know.

Answer 2

Behavior of OpenAsync and what SQL Server sees

For ADO.NET clients (including Microsoft.Data.SqlClient), the connection is fully established (TCP + pre‑login + login) before SqlConnection.Open / OpenAsync returns. If SQL Server shows a session in sys.dm_exec_sessions, the login phase has completed and the server has accepted the connection.

A TCP reset during or after login is surfaced by the client as a network error from the TCP provider, with error number 0 and a message like:

A transport-level error has occurred when sending/receiving the request to/from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)

These are classified as network connection termination errors and are typically due to:

Temporary unavailability of the database or elastic pool
Problems in the network infrastructure between client and server (firewalls, appliances, routers) that may drop or reset connections, sometimes based on idleness or other policies

In this case, SQL Server seeing a session but no command, followed by a reset, is consistent with a network device or the client stack closing the connection after login but before the first TDS command is processed.

Effect of ConnectRetryCount / ConnectRetryInterval

For .NET clients, ConnectRetryCount and ConnectRetryInterval control the built‑in connection retry behavior for transient errors during connection establishment and for idle connection resiliency:

For a new connection:
- The first failure is detected after the initial Open attempt.
- Retries occur up to ConnectRetryCount times, with ConnectRetryInterval seconds between retries.
- Each retry is a new physical connection attempt.
- The connection timeout must be large enough to allow all retries: Connection Timeout >= ConnectRetryCount * ConnectRetryInterval
For an idle connection (broken while idle, detected on next command):
- Retry 1 happens immediately when the broken connection is detected.
- Subsequent retries are spaced by ConnectRetryInterval.
- The Command Timeout must be large enough to allow retries: Command Timeout > (ConnectRetryCount - 1) * ConnectRetryInterval

So yes, when a transient failure occurs during connection open, SqlClient will perform multiple physical TCP connection attempts, and SQL Server can observe multiple short‑lived sessions corresponding to those retries.

What in this setup can explain the intermittent resets

From the documented behavior of Azure SQL / SQL Server connectivity and transient faults:

Transient errors are expected in cloud / networked environments and often correlate with load balancing, resource reconfiguration, or network devices dropping connections.
Network connection termination errors (including “connection reset by peer”) are raised when the TCP provider detects that the remote host or an intermediate device has closed the connection unexpectedly.
These errors may be transient or permanent; applications should use a fixed number of retry attempts before treating them as permanent failures.

Specific contributing factors that match the scenario:

Network devices between Azure Functions and on‑prem SQL (firewalls, VPN gateways, routers) can:
- Reset connections during or shortly after login.
- Apply policies based on perceived idleness or connection patterns.
- Cause intermittent failures that do not affect all clients equally.
Client‑side stack issues:
- Antivirus or network filter drivers on the client side can delay or interfere with packets, leading to timeouts or resets.
- Slow DNS or security APIs invoked before sending the initial SYN can cause connection timeouts.
Server‑side performance issues (less likely given the health check, but still possible intermittently):
- Poor performance, blocking, or resource pressure can delay handling of login or subsequent commands, causing client timeouts.

The documentation recommends:

Implementing robust retry logic for transient errors (already done in the sample code).
Logging all errors over time to detect patterns.
Using diagnostics such as:
- Client/server traces
- Network traces to see which side is resetting the connection
- Tools like PerfView or SQL Network Analyzer when deeper analysis is needed

Experience with this pattern in Azure / guidance

The documented guidance for intermittent “connection reset by peer” / TCP provider errors in SQL connectivity scenarios is:

Treat them as transient faults and retry a fixed number of times.
If both client and server traces show the server closing connections unexpectedly, investigate server performance and resource issues.
If both traces show the client closing connections or a network device in the middle, investigate:
- Firewalls or routers that drop idle or long‑lived connections
- TLS / cipher suite configuration and compatibility
- Antivirus or other network filter drivers on the client side
For Azure‑hosted clients, also validate:
- VNET configuration and any network appliances in the path
- Port and firewall rules

In this scenario, the behavior (session created, no command, reset after tens of seconds, intermittent, affected by app restarts) aligns with a transient network termination between Azure Functions and the on‑prem SQL Server rather than a deterministic bug in Microsoft.Data.SqlClient. The recommended path is to:

Continue using retry logic for these TCP provider errors.
Collect coordinated client/server/network traces during failures to identify whether the reset originates from the client, server, or an intermediate device.
Review and adjust network infrastructure (firewalls, VPN, load balancers) and TLS/cipher configuration if traces indicate resets or handshake issues.

References:

Share via

Intermittent "TCP Provider, error: 35 – Connection reset by peer" on Azure Function — connection opens, no command ever received by SQL Server

Symptom

What has been ruled out

What the failing Function does (simplified)

What the working health check does

Connection string (sanitized)

The specific mystery

2 answers

Your answer