Function App deadlock when merge >10k entities to azure data table

Question

Function App deadlock when merge >10k entities to azure data table

Phillip 0

Hey, We have a Function App in Flex Consumption Plan with nodejs that reads a CSV file (50k rows, 2 mb) add merges that row by row to a azure data table. The rows a added one after another. After ~10k rows the function stops logging to application insight, no error and nothing inside the invocations overview. I tried to figure out, if it happens because of any quotas, but memory is low and the duration is ~3 min.

When introducing batch, e.g. merge 10 rows in parallel, it works.

It might be a good solution to use batching, but I am curious what causes the issue. Any idea?

Info: I cannot use transaction based batching as I having unique partition keys

Appreciate your help!

Best,
Phillip

Rakesh Mishra 9,795 Reputation points Microsoft External Staff Moderator

2026-06-08T12:20:12.8066667+00:00
Hello Phillip,

Thank you for reaching out to the Microsoft Q&A community. Experiencing a "silent hang" where your Function App stops processing and logging without throwing an explicit error can definitely be frustrating to troubleshoot.

When dealing with ~50,000 sequential inserts to Azure Table Storage via a Node.js Flex Consumption Function, the behavior you are seeing (stalling out around ~10k operations) typically points to one of two common platform constraints:

1. SNAT Port / Socket Exhaustion: Even though you are awaiting operations sequentially, creating tens of thousands of individual HTTP requests (which the Azure Tables REST API relies on under the hood) can exhaust the allocated outbound connections (SNAT ports) for your function's sandbox. When SNAT ports are exhausted, outbound requests simply "hang" waiting for a socket to open, which leads to the silent deadlock behavior you observed.

According to the official documentation on Azure Functions scale and hosting: "Connections might be exhausted... if you make a large number of outbound requests. To avoid this, you should share client instances and reuse connections."

2. Function Execution Timeout: If you execute 50,000 table merges sequentially, and each operation takes even a modest 20-30 milliseconds, the total execution time will exceed 15-25 minutes. In a Flex Consumption plan, functions are subject to strict execution timeouts. If the timeout is reached, the host environment may forcefully shut down the worker process before Application Insights has a chance to flush the final batch of telemetry, resulting in a sudden drop-off in logs.

How to Resolve This Issue:

Your discovery that batching rows in parallel (e.g., using Promise.all for 10 rows at a time) works is actually the recommended best practice for this scenario. Since your entities have different Partition Keys, you cannot use native Azure Table Batch Transactions, but you can control concurrency natively in Node.js.

To ensure stability, I recommend combining concurrency chunking with HTTP keepAlive to reuse sockets:

Implement Concurrency Control: Continue grouping your promises. Libraries like p-limit or simply chunking your array into groups of 10-50 and yielding them via Promise.all reduces the overall execution time significantly, bypassing the timeout limits.

Enable keepAlive on the TableClient: Ensure your SDK client reuses TCP connections rather than opening a new socket for every merge.

const { TableClient } = require("@azure/data-tables"); const https = require("https"); // Create a custom HTTPS agent with keepAlive enabled to prevent socket exhaustion const agent = new https.Agent({ keepAlive: true }); const client = new TableClient( "https://<your-account>.table.core.windows.net", "tableName", credential, { // Pass the keepAlive agent into the client options requestOptions: { agent: agent } } );

References:

Managing connections in Azure Functions

Troubleshoot SNAT port exhaustion

Could you check the configured functionTimeout in your host.json file to see if the silent failure correlates with the maximum execution limit of your Flex Consumption app?

Note: This response is drafted with the help of AI systems.
Phillip 0 Reputation points

2026-06-08T14:16:03.8766667+00:00
Thank you! I will give it a try. Does it also work with?

TableClient.fromConnectionString()
Phillip 0 Reputation points

2026-06-08T15:39:11.35+00:00

I tested it with new TableClient, but the options object does not seem to be right.

Phillip 0

const keepAliveAgent = new https.Agent({ keepAlive: true });
const baseHttpClient = createDefaultHttpClient();
const httpClient = {
    sendRequest: (request: any) => {
        request.agent = keepAliveAgent;
        return baseHttpClient.sendRequest(request);
    },
};
const client = new TableClient(
    `https://${accountName}.table.core.windows.net`,
    tableName,
    new DefaultAzureCredential(),
    { httpClient }
);

tried like that, but stopped after 16k, same behaviour

Rakesh Mishra 9,795 Reputation points Microsoft External Staff Moderator

2026-06-11T11:26:37.44+00:00
Hello Phillip,

Thanks for the update and for sharing your configuration! Implementing the keepAlive custom HTTP client is a great step, and doing so successfully eliminates SNAT port (socket) exhaustion from the equation.

However, since your function is an HTTP Trigger and it appears you are still iterating through the table merges sequentially (one await at a time), your function has hit a different, hardcoded architectural wall: the Azure Load Balancer timeout.

The 230-Second Limit

Let's do some quick math: If a single Azure Table merge takes roughly 15 milliseconds, processing 16,000 records sequentially will take about 240 seconds (4 minutes). This duration aligns perfectly with a known platform limit.

According to the official Azure Functions scale and hosting documentation:

"Regardless of the function app timeout setting, 230 seconds is the maximum amount of time that an HTTP triggered function can take to respond to a request. This limit exists because of the default idle timeout of Azure Load Balancer."

When your HTTP Trigger runs for more than 230 seconds without returning an HTTP response to the caller, the underlying Azure Load Balancer forcibly drops the connection. The process is terminated abruptly, which perfectly explains why the function simply "stops" at the 16k mark with no errors logged in Application Insights.

The Fix: You Must Implement Chunking

While your keepAlive agent is keeping the single connection healthy, it isn't making the code run fast enough. To safely bypass the 230-second load balancer limit, you must process the rows concurrently in batches using Promise.all(). This will compress the total execution time from several minutes down to just a few seconds.

Here is how you apply concurrency to your existing setup:

const { DefaultAzureCredential } = require("@azure/identity"); const { TableClient } = require("@azure/data-tables"); const { createDefaultHttpClient } = require("@azure/core-rest-pipeline"); const https = require("https"); // 1. Your custom keepAlive logic const keepAliveAgent = new https.Agent({ keepAlive: true, maxSockets: 50 }); const baseHttpClient = createDefaultHttpClient(); const httpClient = { sendRequest: (request) => { request.agent = keepAliveAgent; return baseHttpClient.sendRequest(request); }, }; const client = new TableClient( `https://${accountName}.table.core.windows.net`, tableName, new DefaultAzureCredential(), { httpClient } ); module.exports = async function (context, req) { const entitiesToUpload = req.body.entities; // Assuming 50k rows passed in // 2. Chunking Logic (DO NOT process 1 by 1) const chunkSize = 100; // Process 100 concurrently for (let i = 0; i < entitiesToUpload.length; i += chunkSize) { const chunk = entitiesToUpload.slice(i, i + chunkSize); // Fire 100 HTTP requests in parallel, and await the whole batch await Promise.all(chunk.map(entity => client.upsertEntity(entity, "Merge"))); if (i % 5000 === 0) { context.log(`Successfully merged ${i} entities...`); } } context.res = { body: "All entities merged successfully!" }; }

By adding Promise.all() to fire off 100 requests concurrently, your 50,000 records will finish processing well before the 230-second Azure Load Balancer timeout drops the connection. Note: This code is generated with the help of AI systems.
Phillip 0 Reputation points

2026-06-11T11:58:02.4633333+00:00

Thank you for the answer. In my case I use a timer trigger. So the http trigger timeout should not occur. And still it would be great to get and error or a log for that.

Rakesh Mishra 9,795 Microsoft External Staff Moderator

Even with a Timer Trigger's generous 30-minute window, running 50,000 network requests one-by-one is highly susceptible to transient network drops. You should combine your custom HTTP client with Concurrency Chunking (Promise.all()) to process the batch in seconds rather than minutes.

Please try below

const { DefaultAzureCredential } = require("@azure/identity");
const { TableClient } = require("@azure/data-tables");
const { createDefaultHttpClient } = require("@azure/core-rest-pipeline");
const https = require("https");
// 1. KeepAlive Agent
const keepAliveAgent = new https.Agent({ keepAlive: true, maxSockets: 50 });
const baseHttpClient = createDefaultHttpClient();
const httpClient = {
    sendRequest: (request) => {
        request.agent = keepAliveAgent;
        // Enforce a 10-second timeout to prevent silent promise hangs!
        request.timeout = 10000; 
        return baseHttpClient.sendRequest(request);
    },
};
const client = new TableClient(
    `https://${accountName}.table.core.windows.net`,
    tableName,
    new DefaultAzureCredential(),
    { httpClient }
);
module.exports = async function (context, myTimer) {
    context.log("Timer trigger starting batch merge...");
    
    // Assuming entitiesToUpload is your array of 50k entities
    const entitiesToUpload = [...]; 
    const chunkSize = 100; // Process 100 concurrently
    
    try {
        for (let i = 0; i < entitiesToUpload.length; i += chunkSize) {
            const chunk = entitiesToUpload.slice(i, i + chunkSize);
            
            // 2. Chunking logic (bypasses the 4-minute idle timeframe entirely)
            await Promise.all(chunk.map(entity => client.upsertEntity(entity, "Merge")));
            
            if (i % 5000 === 0) {
                context.log(`Successfully merged ${i} entities...`);
            }
        }
        context.log("All 50,000 entities merged successfully!");
        
    } catch (error) {
        // If a network issue occurs now, it will be caught and explicitly logged
        context.log.error("Error occurred during batch upload: ", error.message);
        throw error; 
    }
}

Rakesh Mishra 9,795 Reputation points Microsoft External Staff Moderator

2026-06-16T14:17:53.09+00:00

Hello Phillip, following up to see if you had a chance to check my previous response and if it was helpful. Please do let me know if you're still facing the issue and need any further assistance on this.
Phillip 0 Reputation points

2026-06-16T14:39:16.2933333+00:00

Hi Rakesh, we are already using chunking, but I am still afraid that at some amount of data it might break and I am still trying to figure out what the exact issue is. It would be good see something in the logs in that case.
Rakesh Mishra 9,795 Reputation points Microsoft External Staff Moderator

2026-06-19T14:42:48.0466667+00:00

Hi Philip, could you please check private message and share requested information to assist you further on this.

1 answer

Your answer

Phillip 0 Reputation points

2026-06-08T14:16:03.8766667+00:00

Thank you! I will give it a try. Does it also work with?

TableClient.fromConnectionString()
Phillip 0 Reputation points

2026-06-08T15:39:11.35+00:00

I tested it with new TableClient, but the options object does not seem to be right.
Phillip 0 Reputation points

2026-06-09T11:41:37.26+00:00

const keepAliveAgent = new https.Agent({ keepAlive: true }); const baseHttpClient = createDefaultHttpClient(); const httpClient = { sendRequest: (request: any) => { request.agent = keepAliveAgent; return baseHttpClient.sendRequest(request); }, }; const client = new TableClient( `https://${accountName}.table.core.windows.net`, tableName, new DefaultAzureCredential(), { httpClient } );

tried like that, but stopped after 16k, same behaviour
Phillip 0 Reputation points

2026-06-11T11:58:02.4633333+00:00

Thank you for the answer. In my case I use a timer trigger. So the http trigger timeout should not occur. And still it would be great to get and error or a log for that.
Rakesh Mishra 9,795 Reputation points Microsoft External Staff Moderator

2026-06-11T13:58:31.71+00:00

Even with a Timer Trigger's generous 30-minute window, running 50,000 network requests one-by-one is highly susceptible to transient network drops. You should combine your custom HTTP client with Concurrency Chunking (Promise.all()) to process the batch in seconds rather than minutes.

Please try below

const { DefaultAzureCredential } = require("@azure/identity"); const { TableClient } = require("@azure/data-tables"); const { createDefaultHttpClient } = require("@azure/core-rest-pipeline"); const https = require("https"); // 1. KeepAlive Agent const keepAliveAgent = new https.Agent({ keepAlive: true, maxSockets: 50 }); const baseHttpClient = createDefaultHttpClient(); const httpClient = { sendRequest: (request) => { request.agent = keepAliveAgent; // Enforce a 10-second timeout to prevent silent promise hangs! request.timeout = 10000; return baseHttpClient.sendRequest(request); }, }; const client = new TableClient( `https://${accountName}.table.core.windows.net`, tableName, new DefaultAzureCredential(), { httpClient } ); module.exports = async function (context, myTimer) { context.log("Timer trigger starting batch merge..."); // Assuming entitiesToUpload is your array of 50k entities const entitiesToUpload = [...]; const chunkSize = 100; // Process 100 concurrently try { for (let i = 0; i < entitiesToUpload.length; i += chunkSize) { const chunk = entitiesToUpload.slice(i, i + chunkSize); // 2. Chunking logic (bypasses the 4-minute idle timeframe entirely) await Promise.all(chunk.map(entity => client.upsertEntity(entity, "Merge"))); if (i % 5000 === 0) { context.log(`Successfully merged ${i} entities...`); } } context.log("All 50,000 entities merged successfully!"); } catch (error) { // If a network issue occurs now, it will be caught and explicitly logged context.log.error("Error occurred during batch upload: ", error.message); throw error; } }
Rakesh Mishra 9,795 Reputation points Microsoft External Staff Moderator

2026-06-16T14:17:53.09+00:00

Hello Phillip, following up to see if you had a chance to check my previous response and if it was helpful. Please do let me know if you're still facing the issue and need any further assistance on this.
Phillip 0 Reputation points

2026-06-16T14:39:16.2933333+00:00

Hi Rakesh, we are already using chunking, but I am still afraid that at some amount of data it might break and I am still trying to figure out what the exact issue is. It would be good see something in the logs in that case.
Rakesh Mishra 9,795 Reputation points Microsoft External Staff Moderator

2026-06-19T14:42:48.0466667+00:00

Hi Philip, could you please check private message and share requested information to assist you further on this.

Answer 1

Phillip hi , thx for sharing urs issue here at Q&A portal,

the function is getting stuck on the async/storage side, not hitting a hard memory or duration limit.

Doing 50k table merges one-by-one is a lot of round trips. After ~10k ops, u may be running into slow retries, socket exhaustion, throttling, or the Node event loop getting backed up. App Insights can also go quiet if the worker is still alive but stuck waiting on pending I/O. The fact that 10 parallel merges works is a good clue. It means the storage account/table can handle the load, but the fully sequential pattern is prob too slow or getting trapped in retries somewhere.

Since u have unique partition keys, normal transactional batch won’t help, yeah. https://learn.microsoft.com/en-us/rest/api/storageservices/performing-entity-group-transactions

Best fix is what u already tested process in small chunks with controlled concurrency, like 10-50 at a time, plus retry/backoff for 429, 503, and timeouts. Don’t fire all 50k at once tho, that just moves the fire from the kitchen to the garage.

Worth logging every 500 or 1000 rows with the current row number and catching/logging each failed merge. I’d also log SDK retry attempts if possible, because my guess is the function isn’t deadlocked, it’s sitting inside storage SDK retries or waiting on network I/O.

For longer imports, Durable Functions or a queue-based fan-out is cleaner. Put rows/chunks onto a queue and let multiple function executions process them. https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview So yeah, batching is not just a workaround here. For 50k remote table writes, controlled concurrency is basically the right design.

rgds,

Alex

&

If my answer was helpful pls mark it and additional thx if u follow me at Q&A portal

Phillip 0 Reputation points

2026-06-23T13:26:36.53+00:00

Hi Alex, thank you for the idea. I might check it out! What I still don't get is why it doesn't work, when I got through it row by row, with any parallel merges.

Share via

Function App deadlock when merge >10k entities to azure data table

1 answer

Your answer