Recurring Allocation Failures for NCasT4_v3 GPU VMs in Sweden Central – Capacity Stability, Reservations, and Cross-Region Migration Guidance

Question

Recurring Allocation Failures for NCasT4_v3 GPU VMs in Sweden Central – Capacity Stability, Reservations, and Cross-Region Migration Guidance

Esraa Sayed 0

Hello,

We are experiencing recurring allocation failures with GPU virtual machines in the Sweden Central region and would like your assistance in clarifying the root cause, expected stability, and recommended mitigation options.

1. Environment Overview

VM Series: NCasT4_v3
Region: Sweden Central
Availability Zones Used: AZ 1 (initially), AZ 2 (subsequently)
Usage Pattern:
- VMs are shut down daily for approximately 8 hours
  - VMs are restarted afterward for active development usage and we face the allocation error during the start time or at any restart trial along the day

2. Issue Description

Earlier this week, we started encountering allocation failures when attempting to start existing NCasT4_v3 VMs in AZ 1, with errors indicating insufficient resources in the availability zone.

To mitigate this:

We created images from the affected VMs.
We attempted to recreate the VMs in AZ 2 and reattach the same network interfaces.
Despite this, we continue to face intermittent allocation failures with the same VM size in AZ 2 as well.

This behavior makes VM availability unpredictable and directly impacts our development teams, as we cannot reliably bring the environments online when needed.

3. Migration to Another Region (If Required)

If Sweden Central GPU capacity is expected to remain constrained, we would like guidance on migrating to a region with more stable GPU availability.

Specifically:

The recommended approach to migrate an environment across different regions consisting of:
- 2 VMs
  - Managed OS and data disks
    - Network interfaces within the same virtual network

4. Networking Constraints

We attempted to use VM images for migration but encountered the following limitations:

VM images cannot be directly used across regions.
Network interfaces cannot be migrated while preserving:
- The same private IPs
  - The same public IPs

Please confirm:

Whether there is a supported approach to migrate VMs across regions while retaining IP addresses (private and/or public), or
Whether IP reassignment is unavoidable in cross-region migrations.

Thanks in advance

2 answers

Your answer

Answer 1

The recurring allocation failures you are experiencing with NCasT4_v3 GPU VMs in the Sweden Central region can be attributed to insufficient resources in the availability zones you are utilizing. This is a common issue when demand exceeds the available capacity for specific VM sizes in a given region or zone.

Recommendations for Mitigation:

Retry Allocation: Sometimes, simply retrying the allocation request can help, as resources may become available after a short period.
Resize the VM: Consider resizing your VMs to a different size that may have more availability in the region or zone. This can help alleviate the allocation issues.
Change the Region or Zone: If the current region or zone continues to experience high demand, you might want to deploy or migrate your VMs to a different region or availability zone where there is more capacity. This can be done by creating a new VM using a copy of the OS disk in the new zone or region.

Migration Guidance:

For migrating your environment across different regions while retaining managed OS and data disks, as well as network interfaces, you can follow these steps:

Create a Snapshot: First, create snapshots of your managed disks.
Create a VM from Snapshot: Use these snapshots to create new VMs in the target region.
Networking Considerations: Unfortunately, when migrating VMs across regions, you cannot retain the same private or public IP addresses. You will need to reassign IP addresses in the new region, as there are no supported methods to migrate IP addresses while preserving them.

Conclusion:

Given the current capacity constraints in the Sweden Central region, it may be beneficial to explore other regions for more stable GPU availability. If IP reassignment is unavoidable, plan for this in your migration strategy to minimize disruption to your development teams.

References:

Answer 2

Hello @Esraa Sayed ,

Whether there is a supported approach to migrate VMs across regions while retaining IP addresses (private and/or public), or whether IP reassignment is unavoidable in cross-region migrations.

Unfortunately, it's not possible to keep the same IP addresses because of platform limitations with Azure Networking services.

Explanation:

-> Private IPs are linked to subnets within a VNet, and VNets are specific to regions. If you move or recreate a VM in a different region, it must use a new VNet, so the original private IP can't be kept.

-> Azure Public IPs are also region-specific. A public IP created in the Sweden Central region can't be transferred or used with a VM in another region, so it can't be retained during migration.

Then how to do migration without affecting users?

Rather than using the raw IP, consider placing the VM behind one of the following:

Azure Load Balancer
Azure Application Gateway
Azure Front Door (global)

Direct users to a DNS name instead of the IP address.

During migration, simply update the backend, and users will not experience any changes.

Hope it helped to answer your query, let me know if there are any more questions around it.

Please do not forget to click "Accept Answer” and Yes, this can be informational to other community members as well.

User's image

Share via

Recurring Allocation Failures for NCasT4_v3 GPU VMs in Sweden Central – Capacity Stability, Reservations, and Cross-Region Migration Guidance

1. Environment Overview

2. Issue Description

3. Migration to Another Region (If Required)

4. Networking Constraints

2 answers

Recommendations for Mitigation:

Migration Guidance:

Conclusion:

Your answer