S50 disk (4000GB HDD + PerfPlus) became slow (<= 15MBps sequential writes)

Pieter Bas Hofstede 20 Reputation points
2025-12-01T10:49:51.96+00:00

Hi all,

We've got a FX12mds_v2 (uncached premium SSD throughput of 1,636 and 1,750).

Two weeks ago, copying files from the local NVME drive to HDD managed storage (s50 4000GB HDD + PerfPlus) was like we expected. Throughput of about 200MBps to 300MBps during peaks lasting 30minutes. Beside the peaks rate was about 40-60MBps. Peaks were active like 75% of the time, the 'low' rates were about 25% of the time. No real complaints.

But since 7 days, the peaks are at max 100MBps and are about 5% active and the slow performance is about 15MBps for 95% of the time. This is becoming problematic. I expect more baseline performance for a s50-tier + PerfPlus enabled disk. Should I?

Characteristics

  • average responsetime within Windows for this disk is during copy between 700ms and 1800ms.
  • VM is not really busy
  • Other remote storage (SSDv2) is almost doing nothing.
  • HDD is filled for 70%. Older files are deleted automatically (rotation).
  • We see quite some logging that Process C:\Windows\explorer.exe is delaying system shutdown after 5016 milliseconds.
  • We have as low as possible I/O on the C-systemdrive. Pagefile is on nvme-tempdrive.
  • The copy-file action is the only action on the HDD-drive at the time.

Problem

Backup-files of about 250GB placed on the temporary local NVME disk are being moved (Move-Item in PowerShell) to the HDD. two weeks ago, performance was nice (200-300 MBps, last week very poor 14MBps).

Already tried:

  • create a brand new HDD. The first week HDD performed as expected, later on very poor.

Not tried yet:

  • re-deploy VM to another node. I am about to try this. The SSDv2 storage is behaving like expected.
  • enabling host R/W-caching for the HDD drive (currently caching is off, also when performance was ok).
Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Himanshu Shekhar 2,090 Reputation points Microsoft External Staff Moderator
    2025-12-01T13:55:09.8666667+00:00

    Hello @Pieter Bas Hofstede ,

    please find the root cause behind the issue observed:

    The engineering team found a bug in one of the internal features used by the disk platform, causing extra retries and processing on your end, which leads to higher disk latency. A permanent fix is in the works and will be rolled out soon. In the meantime, to boost performance, the product team suggests switching the affected disk from Standard HDD to Premium SSD, as it offers better speed and lower latency for production workloads.

    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Q&A Assist
    2025-12-01T10:50:04.8866667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The performance issue you are experiencing with the S50 disk (4000GB HDD + PerfPlus) could be attributed to several factors. Here are some considerations and troubleshooting steps you might find useful:

    1. Storage Performance Characteristics: Ensure that the backing storage has the required performance characteristics to meet the desired network throughput. For Azure Disk Storage, the expected performance for S50 tier disks should be higher than what you are currently experiencing.
    2. Disk Utilization: Since your HDD is filled to 70%, it may be affecting performance. Consider whether the disk is being heavily utilized by other processes or if fragmentation could be an issue.
    3. I/O Performance Monitoring: Use performance counters to monitor the average response time and throughput of the disk. High response times (700ms to 1800ms) indicate potential bottlenecks. If the disk is consistently slow, it may be worth investigating the I/O patterns and whether they are saturating the disk.
    4. Caching Options: You mentioned that host R/W-caching is currently off. Enabling caching could improve performance, especially for workloads that involve frequent read/write operations.
    5. VM Node Deployment: Since you are considering re-deploying the VM to another node, this could help if the current node has underlying performance issues. Testing the performance on a different node can help isolate the problem.
    6. Temporary Local NVMe Disk: If the temporary local NVMe disk is performing well, consider whether the data transfer methods (like Move-Item in PowerShell) are optimized for performance. Sometimes, using different methods or tools for file transfer can yield better results.
    7. Monitoring Logs: The logging related to explorer.exe delaying system shutdown may indicate that there are processes that are not completing efficiently. Investigating these logs could provide insights into any underlying issues affecting performance.
    8. Performance Testing: If possible, conduct a performance test using tools like Diskspd to get a baseline of the disk performance and identify any discrepancies.

    By addressing these areas, you may be able to identify the root cause of the slow performance and take appropriate actions to mitigate the issue.


    References:


  2. Pieter Bas Hofstede 20 Reputation points
    2025-12-02T08:08:11.83+00:00

    Lets add some metrics from azure portal:

    disk read + disk write bytes/sec

    afbeelding

    disk read + disk write IOPS/sec

    afbeelding

    VM data disk %bandwith consumed + data disk %IOPS consumed

    afbeelding

    "For Azure Disk Storage, the expected performance for S50 tier disks should be higher than what you are currently experiencing."

    Yes, I agree. But how to solve? @Himanshu Shekhar can we promote this to an internal specialist?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.