Share via

Linux VM unreachable via SSH (Connection refused) and VM Agent offline after resize to D4s v5

Marco Andersohn 20 Reputation points
2026-05-18T12:27:49.5833333+00:00

Hello everyone,

I am facing a critical issue with my Linux VM after upgrading its size to a Standard D4s v5 instance. Since the migration, the VM has become completely inaccessible.

The Symptoms

  • SSH connections are actively rejected with a "Connection refused" error.
  • The Azure Portal displays a warning stating that the VM Agent is unavailable or not installed.

What I have already verified

The public IP address and my credentials are correct.

  • The operating system disk is definitely not full.

Gemini's idea: Since both the SSH daemon and the Azure VM Agent went offline simultaneously right after the resize, I highly suspect the operating system is stuck during the boot sequence, possibly dropping into Emergency Mode due to hardware UUID changes or network interface remapping.

Since I am on a basic support plan and cannot open a technical ticket, I would greatly appreciate any insights from the community on how to diagnostics or force-recover the boot sequence.

Thank you in advance,

Marco

Azure Virtual Machines
Azure Virtual Machines

An Azure service that is used to provision Windows and Linux virtual machines.


Answer accepted by question author

Jilakara Hemalatha 13,750 Reputation points Microsoft External Staff Moderator
2026-05-18T13:15:13.1366667+00:00

Hello Marco,

Thank you for sharing the detailed information.

Based on the behavior observed after resizing the VM to the Standard D4s_v5 SKU, the issue appears to be related to the Linux operating system not completing the boot process successfully. Since both SSH access and the Azure Linux VM Agent became unavailable immediately after the resize operation, this usually indicates that critical services are failing during startup or the VM has entered emergency mode.

The “Connection refused” error suggests that the VM is still reachable over the network, but the SSH service itself is not running or the OS boot sequence is incomplete. In similar scenarios, this can occur due to network interface remapping after the resize, invalid mount entries in /etc/fstab, filesystem-related startup failures, or issues with the Azure Linux VM Agent during boot.

As a first step, we recommend enabling and reviewing Boot Diagnostics from the Azure portal under: VM → Support + Troubleshooting → Boot diagnostics

Please check the console screenshot and serial logs for any filesystem errors, emergency mode messages, mount failures, or kernel-related errors.

Documentation: https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/boot-diagnostics?

We also recommend accessing the VM through Azure Serial Console to verify the current boot state and collect additional logs. Once connected, please review the status of the SSH service, Azure Linux Agent, network interfaces, and filesystem configuration using the following commands:

systemctl status sshd
systemctl status walinuxagent
journalctl -xb
ip addr
cat /etc/fstab
blkid

Documentation: https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/serial-console-linux

If the VM still remains inaccessible, you may also try performing a VM Redeploy operation, which moves the VM to a new Azure host while preserving the OS disk and configuration.

Documentation: https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/redeploy-to-new-node-linux?

In case the issue persists, the recommended recovery approach would be to use the Azure VM Repair workflow to attach the OS disk to a temporary repair VM and correct the configuration offline.

Documentation: Repair a Linux VM using Azure VM Repair commands

Additionally, the following troubleshooting guidance may also help:

Troubleshoot Linux VM boot issues

Azure Linux VM Agent overview

Hope this helps! Please let me know if you have any queries.

Was this answer helpful?

1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. AI answer

    2026-05-18T12:28:19.6266667+00:00

    Given that SSH is refused and the VM Agent shows as unavailable right after a resize, the most likely causes are:

    • SSH daemon not running or not listening on the expected port/interface.
    • OS stuck or failing during boot (for example, due to filesystem or device/UUID issues), so services like sshd and the agent never start.

    Use the following sequence to diagnose and recover.

    1. Use Boot Diagnostics and Serial Console
    2. In the Azure portal, open the VM and check Boot diagnostics for any filesystem or boot errors.
    3. Open the Azure VM Serial Console. This provides an interactive shell even when SSH is down.
    4. If the VM is in emergency mode or shows filesystem errors, follow the filesystem troubleshooting guidance. Look for messages similar to:
      • EXT4-fs ... Filesystem error recorded from previous mount or
      • You are in emergency mode. After logging in, type "journalctl -xb" ....

    If filesystem errors are present, follow the documented recovery steps for Linux VM boot issues caused by filesystem errors.

    1. Check SSH service and port from Serial Console

    Once logged in via Serial Console:

    1. Check sshd status:
         sudo systemctl status sshd.service
      
      • If Active is not running, investigate the error shown and restart:
         sudo systemctl restart sshd.service
      
    2. Verify sshd is listening:
         sudo ss --listen --tcp --process --numeric | grep sshd
      
      • Confirm it is listening on 0.0.0.0:22 or the configured SSH port.
    3. Confirm the configured SSH port:
         grep -i port /etc/ssh/sshd_config
      
      or
         grep -i listen /etc/ssh/sshd_config
      
      • If nothing is returned or only commented lines appear, sshd defaults to port 22.

    If sshd is not running or not listening, correct the configuration (for example, fix syntax errors in sshd_config), then restart sshd and test SSH again.

    1. Repair common SSH directory permission issues

    If logs show errors like:

    • On RHEL: /var/empty/sshd must be owned by root and not group or world-writable.
    • On SUSE: /var/lib/empty must be owned by root and not group or world-writable.
    • On Ubuntu: /var/run/sshd must be owned by root and not group or world-writable.

    Fix them from Serial Console:

    • RHEL:
        sudo mkdir -p /var/empty/sshd
        sudo chmod 755 /var/empty/sshd
        sudo chown root:root /var/empty/sshd
      
    • SUSE:
        sudo mkdir -p /var/lib/empty
        sudo chmod 755 /var/lib/empty
        sudo chown root:root /var/lib/empty
      
    • Ubuntu:
        sudo mkdir -p /var/run/sshd
        sudo chmod 755 /var/run/sshd
        sudo chown root:root /var/run/sshd
      

    Then restart sshd and test SSH.

    If Serial Console login is not possible (for example, only SSH keys were configured), use Run Command from the Azure portal:

    1. In the VM blade, go to Run Command under Operations.
    2. Run RunShellScript with the appropriate script for the distribution (same commands as above). The VM Agent must be in Ready state for this to work.
    3. Reset SSH configuration and credentials

    If sshd configuration is suspected to be corrupted or misconfigured:

    1. In the Azure portal, open the VM.
    2. Under Help, select Reset password.
    3. Choose Reset configuration only and select Update to reset SSH configuration to defaults.
    4. Optionally, reset or create a user with SSH key or password using Reset SSH public key or Reset password.

    After the reset completes, test SSH again.

    1. Restart or redeploy the VM

    If the VM appears to boot but SSH and the agent remain offline:

    1. Restart the VM:
      • Azure portal: select the VM and choose Restart.
      • Azure CLI:
             az vm restart --resource-group <yourResourceGroup> --name <yourVM>
        
    2. If the issue persists, redeploy the VM to a new host node:
      • Azure portal: under Help, select Redeploy.
      • Azure CLI:
             az vm redeploy --resource-group <yourResourceGroup> --name <yourVM>
        
      Note: Redeploying moves the VM to a new node. Ephemeral disk data is lost and dynamic IP addresses are updated.
    3. Verify network path (NSG and routing)

    Once sshd is confirmed running and listening, if "Connection refused" persists from the client, verify:

    • NSG rules allow inbound SSH (default port 22) to the VM.
    • Use IP flow verify to confirm no NSG rule is blocking SSH.
    • Use Next hop to ensure routing is correct and traffic reaches the VM.

    These steps collectively address SSH refusal, VM Agent offline status, and boot issues after a resize by using Serial Console, Run Command, SSH reset, restart/redeploy, and network verification.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.