How to scale Sync-over-Async HTTP Gateways without hitting Azure Service Bus Session limits?

Question

How to scale Sync-over-Async HTTP Gateways without hitting Azure Service Bus Session limits?

Shivam Saluja 0

We are building an edge API Gateway to bridge legacy synchronous HTTP clients with long-running, asynchronous AI workers (tasks taking 45+ seconds).

To prevent 504 Gateway Timeouts and thread exhaustion, we are using a Sync-over-Async pattern: the REST controller receives the request, drops it on a Service Bus queue, and waits for a reply from the worker to map back to the HTTP connection.

The standard recommendation is to use Service Bus Sessions to correlate the request to the reply. However, as we scale, we are hitting severe bottlenecks:

Stateful Routing: The Gateway pod must hold a session lock. If that pod crashes, the session is locked and the reply is trapped until timeout.

Standard Tier Limits: A traffic spike easily exhausts the 1,000 concurrent session limit on the Service Bus Standard tier.

Horizontal Scaling: Load balancers distribute incoming requests statelessly, but the session requires stateful tracking to get the reply back to the specific pod holding the open HTTP connection.

Is there a recommended architectural pattern to achieve request-reply correlation over Service Bus at high scale without relying on Sessions, keeping the Gateway pods 100% stateless?

Shivam Saluja 0 Reputation points

2026-04-13T17:40:14.1233333+00:00
Yes, you can bypass Service Bus Sessions entirely and achieve a 100% stateless Gateway layer by using a Filtered Topic Pattern.

Instead of forcing your Gateway instances to manage stateful session locks, you can push the routing logic down to the Azure Service Bus rule engine.

Here is how the pattern works:

1. Explicit Correlation: When the Gateway receives the HTTP request, it generates a unique ID (e.g., CorrelationId = 'Pod-1-Req-A') and attaches it to the outbound message properties.

2. Dynamic Subscriptions: The Gateway does not open a session receiver. Instead, it creates a temporary, dynamic subscription on a shared Reply Topic. It applies a SQL Filter to this subscription: CorrelationId = 'Pod-1-Req-A'.

3. Broker-Side Routing: The backend worker processes the request and sends the reply to the shared Reply Topic, attaching the same CorrelationId. The Azure Service Bus evaluates the SQL filter and pushes the message directly down the pipe to the exact Gateway pod waiting for it.

Why this scales better than Sessions:

No Locks: Gateway pods do not hold exclusive locks on the broker.

Resilience: If a Gateway pod crashes, its temporary subscription is simply deleted. No poison messages are locked in the queue waiting for session timeouts.

Stateless: You can scale from 2 to 2,000 Gateway replicas seamlessly behind a standard HTTP load balancer.

Implementation & Open Source Starter: Managing the dynamic ServiceBusAdministrationClient and receiver lifecycles for this pattern can require a lot of boilerplate.

If you are using Java, I have abstracted this exact stateless pattern into an open-source Spring Boot starter called Sentinel. It handles the topic creation, SQL filters, and asynchronous bridging under the hood, allowing you to execute the Sync-over-Async flow and return a CompletableFuture in a single line of code.

📦 GitHub Repo: https://github.com/ShivamSaluja/sentinel-servicebus-starter

📖 Deep Dive Write-up: https://dev.to/shivamsaluja/sync-over-async-bypassing-azure-service-bus-session-limits-for-ai-workloads-269d

This pattern completely resolved our scaling bottlenecks for long-running AI workloads over HTTP. Hopefully, it helps others running into session limits at the edge!
Rakesh Mishra 8,100 Reputation points Microsoft External Staff Moderator

2026-04-15T20:10:23.7866667+00:00

Hi Shivam, following up to see if you had a chance to check my previous response and if it was helpful. Please do let me know if you're still facing the issue and need any further assistance on this.
Venkatesh Kudkyal 0 Reputation points Microsoft External Staff Moderator

2026-04-21T05:05:58.62+00:00

Hello Shivam,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

2 answers

Your answer

Rakesh Mishra 8,100 Reputation points Microsoft External Staff Moderator

2026-04-15T20:10:23.7866667+00:00

Hi Shivam, following up to see if you had a chance to check my previous response and if it was helpful. Please do let me know if you're still facing the issue and need any further assistance on this.
Venkatesh Kudkyal 0 Reputation points Microsoft External Staff Moderator

2026-04-21T05:05:58.62+00:00

Hello Shivam,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Use a pure asynchronous request-reply pattern and remove the requirement to hold the HTTP connection open. That keeps the gateway stateless and avoids Service Bus session limits entirely.

Key elements:

Decouple HTTP request from worker reply
- Front-end gateway:
  - Validates the request.
  - Enqueues a command message to Service Bus (no session required).
  - Immediately returns 202 Accepted with a URL to a status resource in the Location header.
- This follows the Asynchronous Request-Reply pattern with HTTP polling, where the client no longer waits synchronously for the worker to complete.
Use an HTTP status endpoint instead of in-memory correlation
- The Location header points to a status endpoint, for example /operations/{operationId}.
- The gateway persists operation state in a durable store (cache/DB) keyed by operationId, not in process memory.
- Any gateway pod can handle GET /operations/{operationId} because all state is externalized.
- While work is in progress, the status endpoint returns 200 OK with a body indicating the current state.
- When work completes, the status endpoint either:
  - Returns the result directly, or
  - Returns 303 See Other redirecting to the final resource URL.
Correlate via IDs, not Service Bus sessions
- Include a correlation/operation ID in the command message sent to Service Bus.
- The worker processes the message asynchronously and publishes a completion message or writes the result directly to the status store using that ID.
- No Service Bus sessions are required; correlation is done at the application level via the ID.
Make the gateway and workers fully stateless
- Gateway pods:
  - Only send messages to Service Bus and read/write operation state from a shared store.
  - Do not hold open HTTP connections waiting for Service Bus replies.
- Workers:
  - Consume from Service Bus using standard queues/topics.
  - Write results and state transitions to the shared store.
- This aligns with asynchronous messaging guidance where services scale independently and failures in downstream services do not block the front end.
Apply asynchronous messaging patterns for scale
- Use queues and the Queue-Based Load Leveling and Competing Consumers patterns so workers can scale horizontally and process messages at their own rate.
- Ensure message processing is idempotent so retries do not corrupt state.
- Use Service Bus built-in retry and dead-letter capabilities for robustness.
Manage client experience
- Since operations are long-running (45+ seconds), design the UX around asynchronous completion:
  - Show “in progress” state based on the status endpoint.
  - Optionally send notifications (email, push, webhook) when the operation completes.

This architecture:

Eliminates the need for Service Bus sessions and their concurrency limits.
Keeps gateway pods stateless and horizontally scalable.
Avoids stateful routing and session lock issues because no HTTP connection is tied to a specific worker or queue consumer.

References:

Answer 2

Hey Shivam,

Great question—and nice work teasing out the root of the scaling issue with Sessions. You are completely right that pushing the routing logic down to the broker is the key to achieving stateless gateways, but you need to be very careful with how you implement dynamic subscriptions.

The flow you described (creating a temporary subscription per request) is actually considered an anti-pattern at scale and will quickly hit Azure Service Bus quotas. Specifically:

The 2,000 Limit: Service Bus restricts topics to a maximum of 2,000 subscriptions. A traffic spike of 2k+ concurrent requests will exhaust your namespace quotas and cause complete failure.
Control-Plane Latency: Creating and tearing down subscriptions are management API operations. They are heavily rate-limited and far too slow to put on the hot path of an HTTP request.
Orphaned Entities: If a pod crashes before deleting its per-request subscription, the subscription leaks. Even with AutoDeleteOnIdle, it takes a minimum of 5 minutes for the broker to clean it up.

The Recommended Pattern: Per-Pod Temporary Subscriptions + In-Memory Correlation

To achieve 100% statelessness without hitting limits, you should create a dynamic subscription per Gateway Pod, not per request. Here is the optimized flow:

Gateway Pod Startup (Control Plane)
- When a Gateway pod spins up, it generates a unique ID (e.g., Pod-42).
- It creates a single temporary subscription on the shared ReplyTopic with a SQL Filter: ReplyToPod = 'Pod-42'.
- It configures AutoDeleteOnIdle = 5 minutes so the broker cleans it up if the pod scales down or crashes.
Handling the HTTP Request (Data Plane)
- The pod receives an HTTP request and generates a CorrelationId (e.g., Req-A1).
- It registers a TaskCompletionSource (or CompletableFuture) into an in-memory ConcurrentDictionary, keyed by the CorrelationId.
- It sends the work message to the worker queue, stamping both CorrelationId = 'Req-A1' and ReplyToPod = 'Pod-42'.
Worker Processing
- The AI worker does its 45+ second job and publishes the reply to the ReplyTopic, passing along the same CorrelationId and ReplyToPod properties.
Broker-Driven Fan-in
- Service Bus evaluates the SQL filter and pushes the message strictly to Pod-42's subscription.
- The message pump on Pod-42 reads the message, extracts the CorrelationId, pulls the corresponding Task from the ConcurrentDictionary, and completes it—returning the HTTP response.

Why this is the ultimate scaling fix:

Zero Hot-Path Latency: You execute zero control-plane operations during a request.
Massive Scale: If you scale to 500 gateway pods, you only have 500 subscriptions on the topic—well below the 2,000 hard limit.
Stateless horizontally: Standard load balancers can route the incoming HTTP request to any pod, and the Service Bus handles routing the reply back to the exact pod holding that specific TCP connection.

(Note: Since your AI workers take 45+ seconds, ensure your edge Load Balancers/Ingress controllers have their idle timeout increased to 60s+, otherwise the client will receive a 504 Gateway Timeout before the Service Bus reply even makes it back to the pod! If you cross the 60s boundary, you may need to abandon Sync-over-Async entirely and adopt an Asynchronous HTTP 202 Polling pattern).

Hope this helps you unblock your scale-out safely and let me know in comments if it works.

Note: This response is drafted with the help of AI systems.

Share via

How to scale Sync-over-Async HTTP Gateways without hitting Azure Service Bus Session limits?

2 answers

Your answer