Intermittent "Can't reach database server" on Azure Cosmos DB for PostgreSQL (port 6432) from Azure Functions

Question

Intermittent "Can't reach database server" on Azure Cosmos DB for PostgreSQL (port 6432) from Azure Functions

Pavle Zikic 10

We're running an Azure Functions app (Node.js, timer-triggered) that uses Prisma to connect to an Azure Cosmos DB for PostgreSQL cluster. Intermittently, our function fails with a connection error indicating the database server is unreachable. The database appears to go offline briefly and then recovers on its own, but this has been happening more frequently lately.

Environment

Azure Functions (Node.js, timer trigger)
Prisma Client (@prisma/client)
Azure Cosmos DB for PostgreSQL (coordinator connection on port 6432 / connection pooling endpoint)
Region: [add your region]

Error (recurring)

Invalid prisma.queueJob.findMany() invocation

Can't reach database server at

c-<cluster>.<id>.postgres.cosmos.azure.com:6432

Please make sure your database server is running at

c-<cluster>.<id>.postgres.cosmos.azure.com:6432.

PrismaClientKnownRequestError

at Mn.handleRequestError (@prisma/client/runtime/library.js)

at Mn.request (@prisma/client/runtime/library.js)

Wrapped at the Functions host level as:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException:

Exception while executing function: Functions.queue_worker

---> RpcException: Result: Failure

What we've observed

The failures are transient — the same function succeeds on subsequent runs.
No deployment or config change correlates with the onset; frequency has simply increased over time.
The endpoint uses the pooled connection port 6432.

Questions for the community / Microsoft

Are there known causes of brief coordinator-node unavailability on Azure Cosmos DB for PostgreSQL (e.g., maintenance windows, failovers, automatic scaling, node restarts) that would produce short "can't reach server" windows on port 6432?
Is port 6432 (managed PgBouncer/pooler) more susceptible to these drops than the direct 5432 port, and is one recommended over the other for serverless/Functions workloads?
What is the recommended way to diagnose whether these are node restarts/failovers vs. client-side connection pool exhaustion? Which metrics/logs should we check (e.g., in Azure Monitor / cluster metrics)?
Best-practice guidance for resilient connections from Azure Functions + Prisma (connection limits, timeouts, retry strategy) against this service?

Any pointers appreciated. We also plan to open a support ticket with Azure for the underlying availability investigation.We're running an Azure Functions app (Node.js, timer-triggered) that uses Prisma to connect to an Azure Cosmos DB for PostgreSQL cluster. Intermittently, our function fails with a connection error indicating the database server is unreachable. The database appears to go offline briefly and then recovers on its own, but this has been happening more frequently lately.

Environment

Azure Functions (Node.js, timer trigger)
Prisma Client (@prisma/client)
Azure Cosmos DB for PostgreSQL (coordinator connection on port 6432 / connection pooling endpoint)
Region: [add your region]

Error (recurring)

Invalid prisma.queueJob.findMany() invocation

Can't reach database server at

c-<cluster>.<id>.postgres.cosmos.azure.com:6432

Please make sure your database server is running at

c-<cluster>.<id>.postgres.cosmos.azure.com:6432.

PrismaClientKnownRequestError

at Mn.handleRequestError (@prisma/client/runtime/library.js)

at Mn.request (@prisma/client/runtime/library.js)

Wrapped at the Functions host level as:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException:

Exception while executing function: Functions.queue_worker

---> RpcException: Result: Failure

What we've observed

The failures are transient — the same function succeeds on subsequent runs.
No deployment or config change correlates with the onset; frequency has simply increased over time.
The endpoint uses the pooled connection port 6432.

Questions for the community / Microsoft

Are there known causes of brief coordinator-node unavailability on Azure Cosmos DB for PostgreSQL (e.g., maintenance windows, failovers, automatic scaling, node restarts) that would produce short "can't reach server" windows on port 6432?
Is port 6432 (managed PgBouncer/pooler) more susceptible to these drops than the direct 5432 port, and is one recommended over the other for serverless/Functions workloads?
What is the recommended way to diagnose whether these are node restarts/failovers vs. client-side connection pool exhaustion? Which metrics/logs should we check (e.g., in Azure Monitor / cluster metrics)?
Best-practice guidance for resilient connections from Azure Functions + Prisma (connection limits, timeouts, retry strategy) against this service?

Any pointers appreciated. We also plan to open a support ticket with Azure for the underlying availability investigation.

SAI JAGADEESH KUDIPUDI 3,630 Reputation points Microsoft External Staff Moderator

2026-07-04T01:14:10.7833333+00:00
Based on the symptoms, this appears to be a transient connectivity issue rather than a configuration problem. Short connection interruptions can occur during maintenance activities, infrastructure events, coordinator failovers, or when the cluster is under resource pressure. Applications connecting to Azure Cosmos DB for PostgreSQL should be designed to handle transient connection failures with retry logic.

6432 vs 5432

Port 6432 (PgBouncer): Recommended for Azure Functions and other serverless workloads because it reduces connection overhead and supports a larger number of client connections.

Port 5432 (Direct): Useful for troubleshooting, but direct connections can be more susceptible to connection exhaustion as concurrency increases.

How to diagnose

Review Azure Monitor metrics such as Active Connections, CPU %, Memory %, IOPS, and Network traffic around the failure timestamps.

Check Azure Service Health and Activity Logs for maintenance or failover events that coincide with the failures.

Review application logs to determine whether the failures are connection timeouts, connection drops, or connection pool exhaustion.

Best practices

Implement retry logic with exponential backoff (3–5 retries).

Reuse a single PrismaClient instance across function invocations.

Avoid connection bursts during Function startup or rapid scale-out events.

Continue using the pooled endpoint (6432) for Azure Functions unless troubleshooting identifies a specific issue with the pooler.

If the issue occurs frequently, lasts longer than expected transient windows, or happens multiple times per day, collect the exact UTC timestamps and cluster details and engage Azure Support for a deeper backend investigation.
References

Troubleshoot connection issues to Azure Cosmos DB for PostgreSQL

Connection pooling in Azure Cosmos DB for PostgreSQL

Monitor and tune Azure Cosmos DB for PostgreSQL

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

1 answer

Your answer

Answer 1

Cause : Yes. Short outages can happen due to failover, maintenance, scaling, or PgBouncer (6432) restarts. Your pattern = transient platform events.
6432 vs 5432

6432 (pooler): slightly more drops, but recommended for Functions

5432 (direct): more stable, but risk of connection exhaustion -> Stay on 6432

How to diagnose Check:

Azure Monitor: node restarts, failover events, connection drops

Activity log: maintenance/failover

Metrics: connections, CPU spikes

App logs: timeout vs immediate failure

Best practices

Retry (must): 3 - 5 attempts with backoff
Low connection limit (prisma?)

Use pooler (6432)

Avoid burst connections on startup

Set sane timeouts (5–10s)

Normal transient behaviour. Fix with retry + connection tuning, not by switching ports