An Azure real-time data ingestion service.
Server busy errors in Azure Event Hubs are throttling signals that are raised when the service detects that either a partition or the namespace is exceeding its provisioned throughput or certain operational limits, even if the aggregated Incoming Bytes metric does not look high.
The criteria are as follows:
- Partition-level throughput hot spots (error code 50002)
- If traffic is unevenly distributed and one partition hits its local throughput-unit (TU) limit, Event Hubs returns a
ServerBusyExceptioneven though the overall namespace Incoming Bytes may appear moderate. - Resolution is to revise the partition distribution strategy or send without a partition key so that Event Hubs can balance load across partitions.
- If traffic is unevenly distributed and one partition hits its local throughput-unit (TU) limit, Event Hubs returns a
- Namespace-level throughput-unit limits (error code 50002)
- Each TU provides up to 1 MB/s ingress (and ≤1000 ingress requests/s) and 2 MB/s egress (and ≤4096 egress events/s) across the namespace.
- When real-time ingress/egress or request rate exceeds
TUs × limits, Event Hubs throttles and emits server busy errors. - The portal’s metrics are 1‑minute aggregates; instantaneous spikes above the TU limits can trigger throttling even if the 1‑minute Incoming Bytes sum looks low.
- Excessive management/metadata calls (error code 50008)
- If the container running the namespace is CPU constrained, or if there are too many calls to operations like
GetRuntimeInformation/GetRuntimeInfo, Event Hubs can returnServerBusyExceptionwith error code 50008. - Event Hubs supports up to 50 GetRuntimeInfo calls per second per consumer group; exceeding this can cause short bursts of server busy errors.
- If the container running the namespace is CPU constrained, or if there are too many calls to operations like
- How to interpret and investigate the metric
- Use the Throughput and Requests tabs on the namespace Overview → Show metrics blade.
- For the time ranges where
Server Busy Errorsspike, check:- Incoming Bytes / Outgoing Bytes peaks and divide by 60 to approximate bytes per second.
- Incoming Requests at peak and compare to TU limits (1000 ingress requests/s per TU).
- If peak values exceed the TU-based limits, increase TUs on the Scale page or enable Auto-inflate so the namespace can scale up automatically.
- Client behavior and retries
- For AMQP/Event Hubs clients,
ServerBusyExceptionis a transient error that should be handled with retries and backoff. For example, the JavaServerBusyExceptiondocumentation recommends waiting at least 4 seconds before retrying operations for that entity.
- For AMQP/Event Hubs clients,
In summary, server busy errors are triggered when either a single partition or the namespace as a whole exceeds throughput or operational limits (bytes/s, requests/s, or certain management-call limits). Short spikes and partition hot spots can cause these errors even when the aggregated Incoming Bytes metric appears low.
References: