Azure AI Document Undersstanding - Quotas and Metrics

Question

Azure AI Document Undersstanding - Quotas and Metrics

Jack Halpern 20

We have been analyzing documents with Custom Classifier and Custom schema successfully for almost 6 months. Using the Azure.AI.Contentunderstanding nuget with WaitUntil.Completed set on our AnalyzeAsync we've recently encountered situations in which the AnalyzeAsync never returns. We suspect that somehow we've exceeded out quota or the request is being lost or dropped.

We only have one subscription and deploy our projects in the East region. We believe we're in Tier2. Using the Metrics blade (see screenshot) I've tried to figure out if, in fact, we're reaching quota limit. But I don't know where to begin.

My question: Is there a simple screen or set of screens I can use to verify if it's a quota limit problem or something else? Or some combination of Metric and Aggregation which we should focus on? Any guidance you can give to help us determine why some documents never complete would be appreciated.

Thanks

Jack

User's image

Jack Halpern 20 Reputation points

2026-03-16T21:13:23.2066667+00:00

Focus on page‑based metrics, not request metrics.I don't see any page-based metrics on the Metrics page. Am I looking in the wrong place?
Jack Halpern 20 Reputation points

2026-03-16T23:42:05.2866667+00:00
"This kind of behavior usually happens when the service is busy or throttled, not because the request is completely lost"

This doesn't seem to match what we're seeing. Just today we had about 30 documents work fine over a 30 minute period. An hour later we submitted 8 documents, 5 worked fine and 3 documents never responded for the entire day. A few hours later we submitted about another 20 and again 6 of them still haven't responded. It seems that those docs are in fact "lost". Just now (a few hours later) I re-submitted one of the "lost" docs and it returned fine about a minute later. The original submission still hasn't returned. So, I think it's something more than just "This can look like AnalyzeAsync never returning". I'm pretty sure it is, in fact, never returning.

I run with these options, but I don't think the Retry is ever being invoked. When I run with Diagnostics on I just see the endless polling for status every 2 seconds.

var clientOptions = new ContentUnderstandingClientOptions() { Retry = { Delay = TimeSpan.FromSeconds(30), // The initial delay between retries MaxRetries = 2, // The maximum number of retry attempts Mode = RetryMode.Exponential, // Use exponential backoff mode MaxDelay = TimeSpan.FromSeconds(60), // The maximum delay allowed between retries }, Diagnostics = { IsLoggingContentEnabled = true, IsLoggingEnabled = true, } };
Anshika Varshney 9,335 Reputation points Microsoft External Staff Moderator

2026-03-18T17:48:02.89+00:00

Thanks for the detailed explanation. What you are describing does help clarify the behavior, and I agree this feels different from simple short‑term throttling.

In Azure AI Document Understanding, when you submit a document for analysis, the service accepts the request first and then processes it asynchronously in the backend. Quotas and throttling are enforced at the processing layer, not at request submission. Because of that, a request can be accepted successfully but later get stuck in the processing queue when the service is under pressure.

When this happens, the request is not actually lost, but it may never reach a state where it completes or fails. From the client side, this looks like AnalyzeAsync never returning and endless polling, exactly as you are seeing. The retry policy you configured does not trigger in this case because there is no transport error, timeout, or failed HTTP response. The service keeps returning a valid status response, so the SDK believes the request is still in progress and continues polling.

This also explains why re‑submitting the same document later works fine while the original submission never completes. The second request is treated as a new job and may land on a healthy backend path, while the original job remains stuck in a degraded processing state.

This behavior is most commonly seen during periods of uneven load or partial service degradation. It does not affect every request, which is why you see some documents complete successfully in the same time window while others remain pending for hours.

A few practical things to consider when handling this pattern:

If a document stays in running or notStarted state for an unusually long time, it is safer to treat it as a stalled job and resubmit it with a new operation rather than waiting indefinitely.

Client retries only help for transient HTTP failures. They do not help when the service has already accepted the request and the job is stuck server‑side.

Tracking operation duration on your side and applying a maximum wait time per document is usually more reliable than relying only on SDK retries.

This behavior is also why Microsoft documentation focuses on page‑based quotas and processing capacity rather than request count. Requests can be accepted even when processing capacity is temporarily constrained.

the async analyze behavior is explained here https://learn.microsoft.com/azure/ai-services/document-intelligence/how-to-guides/use-sdk-rest-api

Your diagnostics output is consistent with how the service behaves when a job is accepted but never reaches completion. It is not that the request disappears, but that it never exits the processing pipeline.

We are currently reviewing the details. The team is actively working on the investigation, and we will update you as soon as we have more information. Thank you for your patience.
Jack Halpern 20 Reputation points

2026-03-20T13:02:54.7966667+00:00

Thanks for the detailed response. I have made some significant changes to my AI application which seems to have mitigated the problem in dev. Most significant change has been to limit the number of pages sent to the AI. For our application, within the first few pages of a multi-page doc we have most of the information we need. So, if the PDF has more than 5 pages, we split it and only send the first part. We just only moved that into production, but we expect it to reduce if not eliminate the issue.
Anshika Varshney 9,335 Reputation points Microsoft External Staff Moderator

2026-03-24T17:03:52.5066667+00:00

Hi Jack Halpern,
Document Intelligence and Document Understanding quotas are mainly enforced by pages processed and how many analyze operations you run at the same time. So reducing the number of pages you send can directly reduce load and lower the chance of long running or stuck requests. [learn.microsoft.com]

Moving the page limit into production is also a good way to confirm if the issue was caused by sustained load. If the problem comes back, it usually helps to watch two things in production.

First is processed pages and analyze volume around the time you see delays. The Azure portal metrics view can show processed pages and help you correlate spikes with the time requests start taking longer. [learn.microsoft.com],

Second is concurrency. Even if each document is smaller, running many Analyze calls in parallel can still create a backlog. That is why controlling parallelism along with page count is often the most stable setup.

One more thing to keep in mind is the service limits for your tier. For example, there are limits for analyze transactions per second and other operations depending on your pricing tier. Staying within those limits helps avoid throttling behavior. [learn.microsoft.com]

If you continue to see any hangs in production after this change, it would help to share whether it happens only on very large documents, only at peak traffic times, or randomly. That will make it easier for the community to suggest the next best tuning step.

Thankyou!

2 answers

Your answer

Jack Halpern 20 Reputation points

2026-03-16T21:13:23.2066667+00:00

Focus on page‑based metrics, not request metrics.I don't see any page-based metrics on the Metrics page. Am I looking in the wrong place?
Jack Halpern 20 Reputation points

2026-03-16T23:42:05.2866667+00:00

"This kind of behavior usually happens when the service is busy or throttled, not because the request is completely lost"

This doesn't seem to match what we're seeing. Just today we had about 30 documents work fine over a 30 minute period. An hour later we submitted 8 documents, 5 worked fine and 3 documents never responded for the entire day. A few hours later we submitted about another 20 and again 6 of them still haven't responded. It seems that those docs are in fact "lost". Just now (a few hours later) I re-submitted one of the "lost" docs and it returned fine about a minute later. The original submission still hasn't returned. So, I think it's something more than just "This can look like AnalyzeAsync never returning". I'm pretty sure it is, in fact, never returning.

I run with these options, but I don't think the Retry is ever being invoked. When I run with Diagnostics on I just see the endless polling for status every 2 seconds.

var clientOptions = new ContentUnderstandingClientOptions() { Retry = { Delay = TimeSpan.FromSeconds(30), // The initial delay between retries MaxRetries = 2, // The maximum number of retry attempts Mode = RetryMode.Exponential, // Use exponential backoff mode MaxDelay = TimeSpan.FromSeconds(60), // The maximum delay allowed between retries }, Diagnostics = { IsLoggingContentEnabled = true, IsLoggingEnabled = true, } };
Jack Halpern 20 Reputation points

2026-03-20T13:02:54.7966667+00:00

Thanks for the detailed response. I have made some significant changes to my AI application which seems to have mitigated the problem in dev. Most significant change has been to limit the number of pages sent to the AI. For our application, within the first few pages of a multi-page doc we have most of the information we need. So, if the PDF has more than 5 pages, we split it and only send the first part. We just only moved that into production, but we expect it to reduce if not eliminate the issue.
Anshika Varshney 9,335 Reputation points Microsoft External Staff Moderator

2026-03-24T17:03:52.5066667+00:00

Hi Jack Halpern,
Document Intelligence and Document Understanding quotas are mainly enforced by pages processed and how many analyze operations you run at the same time. So reducing the number of pages you send can directly reduce load and lower the chance of long running or stuck requests. [learn.microsoft.com]

Moving the page limit into production is also a good way to confirm if the issue was caused by sustained load. If the problem comes back, it usually helps to watch two things in production.

First is processed pages and analyze volume around the time you see delays. The Azure portal metrics view can show processed pages and help you correlate spikes with the time requests start taking longer. [learn.microsoft.com],

Second is concurrency. Even if each document is smaller, running many Analyze calls in parallel can still create a backlog. That is why controlling parallelism along with page count is often the most stable setup.

One more thing to keep in mind is the service limits for your tier. For example, there are limits for analyze transactions per second and other operations depending on your pricing tier. Staying within those limits helps avoid throttling behavior. [learn.microsoft.com]

If you continue to see any hangs in production after this change, it would help to share whether it happens only on very large documents, only at peak traffic times, or randomly. That will make it easier for the community to suggest the next best tuning step.

Thankyou!

Answer 1

Hi Jack Halpern,

This kind of behavior usually happens when the service is busy or throttled, not because the request is completely lost. For Document Understanding and Document Intelligence, quota is mainly enforced on pages processed and concurrent operations, not just on request count.

Here are a few ways to check whether this is a quota or capacity issue.

Focus on page‑based metrics, not request metrics. Document Intelligence processes documents by pages, and quotas are applied at that level. In the Azure portal Metrics blade for your Document Intelligence resource, look at metrics related to pages processed and analyze operations rather than raw API calls.
Check concurrent analyze operations. If many AnalyzeAsync calls are running at the same time, new requests can wait longer or appear to hang until capacity frees up. This can look like AnalyzeAsync never returning when the system is under sustained load.
Review the Document Intelligence service limits for custom models. If you are using Custom Classifier or Custom extraction models, make sure your usage stays within documented limits such as training size, number of pages, and model limits. Staying within these limits means the service should not reject requests due to quota. https://learn.microsoft.com/azure/ai-services/document-intelligence/service-limits
Use Azure Monitor metrics to correlate timing. If AnalyzeAsync hangs, check whether page processing spikes or long running operations line up with the time the request was submitted. This helps distinguish between quota pressure and transient service load.
Capture correlation IDs from SDK logs for the long‑running calls. Even when a request does not complete, the SDK logs usually include a correlation ID that can be used to trace the operation. This helps confirm whether the request reached the service and is still being processed. How Document Intelligence works and is monitored

In short, there is no single screen that says quota exceeded. The best signal is to compare page processing metrics and concurrency against the documented limits. If those stay within limits, the behavior is more likely due to temporary service load rather than quota exhaustion.

Hope this helps you narrow down where the issue is coming from.

Thankyou!

Answer 2

Jack Halpern 20

Capture correlation IDs from SDK logs for failing calls.

How do I get correlation ID from the AnalyzeAsync method?

This answer is for Document Intelligence not Document Understanding.

0 comments

Share via

Azure AI Document Undersstanding - Quotas and Metrics

2 answers

Your answer