VM was down due to error "We're sorry, your virtual machine is unavailable because of connectivity loss to the remote disk"

Article
2018-06-26

Question

_{Tuesday, June 26, 2018 7:52 AM}

Hi,

The windows server 2012 R2 VM shutdown unexpectedly and started automatically after some time.

This was our production VM and when we checked in event viewer the log was VM got hard restarted.

So when we checked hte resource health logs from Azure portal we found that the issue was from Azure side and description was "We're sorry, your virtual machine is unavailable because of connectivity loss to the remote disk"

Can we know why this happened and what can we do to happening it in future, please help.

Regards,

Viresh.

VireshMathapati13

All replies (6)

_{Tuesday, June 26, 2018 8:03 AM}

Azure virtual machines (VMs) might sometimes reboot for no apparent reason, without evidence of your having initiated the reboot operation. Check this link for more detailed information on the actions and events that can cause VMs to reboot and provides insight into how to avoid unexpected reboot issues or reduce the impact of such issues.

If this answer was helpful, click “Mark as Answer” or Up-Vote. To provide additional feedback on your forum experience, click here

_{Tuesday, June 26, 2018 8:26 AM}

I checked the link you shared but nothing found for "connectivity loss for remote disk" reason.

VireshMathapati13

_{Tuesday, June 26, 2018 7:06 PM}

What region is your VM hosted in? We had an issue yesterday in South Central US

https://azure.microsoft.com/en-us/status/history/

Multiple Services - South Central US

Summary of impact: Between 19:40 and 19:41 UTC on 25 Jun 2018, a subset of customers in South Central US may have experienced difficulties connecting to resources and/or 500-level errors hosted in this region. Virtual Machines may have rebooted unexpectedly. Impacted services included: Storage, Virtual Machines, Key Vault, Site Recovery, Machine Learning, Cloud Shell, Logic Apps, Redis Cache, Visual Studio Team Services, Service Bus, ExpressRoute, Application Insights, Backup, Networking, API Management, App Service (Linux) and App Service.

Preliminary root cause: The load on a single storage scale unit changed unexpectedly and the mechanism to handle such load changes didn't respond fast enough. This caused congestion on some of the backend roles and led to further load imbalance, resulting in timeouts and increased latencies.

Mitigation: Engineers applied mitigation on the scale unit to prevent this from recurring while continuing to investigate full root cause, validating that this was isolated to a single scale unit.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences. A full root cause analysis will be provided within approximately 72 hours.

_{Wednesday, June 27, 2018 11:40 AM}

The VM has hosted in East US2 region.

VireshMathapati13

_{Wednesday, June 27, 2018 3:37 PM}

Thanks for that. Can you send me an email to [email protected] and provide me with the following:

Subscription ID

Link to this thread

VM name

Resource Group Name

Time/ Date of the unexpected shutdown

I can take a look to see what happened.

_{Thursday, June 28, 2018 6:37 PM}

Thanks for the email. Putting the RCA on this page as well so other can reference if they see a similar issue.

I am able to see the VM restart. It appears this reboot was due to a disk fault. When a disk fault is found we restart the VM to restore connectivity and prevent any data loss.

The platform was able to self-mitigate this fault and after the reboot the machine came back online and was fully functional.

To avoid issues like this in the future you could look into setting up your SQL server for high availability. We also have some great suggestions in the SQL best practices doc:

/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-performance

Besides that, I also would suggest you consider moving your disks to manage disks. This allows the platform to manage the storage accounts for you and can reduce possible issues you might have by storing all VHDs in a single storage cluster.

Share via

VM was down due to error "We're sorry, your virtual machine is unavailable because of connectivity loss to the remote disk"

Question

All replies (6)

Multiple Services - South Central US

Additional resources