We're running an Azure Functions app (Node.js, timer-triggered) that uses Prisma to connect to an Azure Cosmos DB for PostgreSQL cluster. Intermittently, our function fails with a connection error indicating the database server is unreachable. The database appears to go offline briefly and then recovers on its own, but this has been happening more frequently lately.
Environment
- Azure Functions (Node.js, timer trigger)
- Prisma Client (
@prisma/client)
- Azure Cosmos DB for PostgreSQL (coordinator connection on port 6432 / connection pooling endpoint)
- Region: [add your region]
Error (recurring)
Invalid prisma.queueJob.findMany() invocation
Can't reach database server at
c-<cluster>.<id>.postgres.cosmos.azure.com:6432
Please make sure your database server is running at
c-<cluster>.<id>.postgres.cosmos.azure.com:6432.
PrismaClientKnownRequestError
at Mn.handleRequestError (@prisma/client/runtime/library.js)
at Mn.request (@prisma/client/runtime/library.js)
Wrapped at the Functions host level as:
Microsoft.Azure.WebJobs.Host.FunctionInvocationException:
Exception while executing function: Functions.queue_worker
---> RpcException: Result: Failure
What we've observed
- The failures are transient — the same function succeeds on subsequent runs.
- No deployment or config change correlates with the onset; frequency has simply increased over time.
- The endpoint uses the pooled connection port 6432.
Questions for the community / Microsoft
- Are there known causes of brief coordinator-node unavailability on Azure Cosmos DB for PostgreSQL (e.g., maintenance windows, failovers, automatic scaling, node restarts) that would produce short "can't reach server" windows on port 6432?
- Is port 6432 (managed PgBouncer/pooler) more susceptible to these drops than the direct 5432 port, and is one recommended over the other for serverless/Functions workloads?
- What is the recommended way to diagnose whether these are node restarts/failovers vs. client-side connection pool exhaustion? Which metrics/logs should we check (e.g., in Azure Monitor / cluster metrics)?
- Best-practice guidance for resilient connections from Azure Functions + Prisma (connection limits, timeouts, retry strategy) against this service?
Any pointers appreciated. We also plan to open a support ticket with Azure for the underlying availability investigation.We're running an Azure Functions app (Node.js, timer-triggered) that uses Prisma to connect to an Azure Cosmos DB for PostgreSQL cluster. Intermittently, our function fails with a connection error indicating the database server is unreachable. The database appears to go offline briefly and then recovers on its own, but this has been happening more frequently lately.
Environment
- Azure Functions (Node.js, timer trigger)
- Prisma Client (
@prisma/client)
- Azure Cosmos DB for PostgreSQL (coordinator connection on port 6432 / connection pooling endpoint)
- Region: [add your region]
Error (recurring)
Invalid prisma.queueJob.findMany() invocation
Can't reach database server at
c-<cluster>.<id>.postgres.cosmos.azure.com:6432
Please make sure your database server is running at
c-<cluster>.<id>.postgres.cosmos.azure.com:6432.
PrismaClientKnownRequestError
at Mn.handleRequestError (@prisma/client/runtime/library.js)
at Mn.request (@prisma/client/runtime/library.js)
Wrapped at the Functions host level as:
Microsoft.Azure.WebJobs.Host.FunctionInvocationException:
Exception while executing function: Functions.queue_worker
---> RpcException: Result: Failure
What we've observed
- The failures are transient — the same function succeeds on subsequent runs.
- No deployment or config change correlates with the onset; frequency has simply increased over time.
- The endpoint uses the pooled connection port 6432.
Questions for the community / Microsoft
- Are there known causes of brief coordinator-node unavailability on Azure Cosmos DB for PostgreSQL (e.g., maintenance windows, failovers, automatic scaling, node restarts) that would produce short "can't reach server" windows on port 6432?
- Is port 6432 (managed PgBouncer/pooler) more susceptible to these drops than the direct 5432 port, and is one recommended over the other for serverless/Functions workloads?
- What is the recommended way to diagnose whether these are node restarts/failovers vs. client-side connection pool exhaustion? Which metrics/logs should we check (e.g., in Azure Monitor / cluster metrics)?
- Best-practice guidance for resilient connections from Azure Functions + Prisma (connection limits, timeouts, retry strategy) against this service?
Any pointers appreciated. We also plan to open a support ticket with Azure for the underlying availability investigation.