Share via


Event id 260 - Hyper-V-VmSwitch - Failed to move RSS queue

Question

Monday, November 25, 2019 8:03 PM

Hello!

So, I have a Windows Server 2019 Datacenter Hyper-V Failover Cluster with two Broadcom 10GbE interfaces on a Switch Embedded Team and RDMA enable for LiveMigation. Since I've put them in production I've noticed some intermittent messages like the ones below:

Failed to move RSS queue 1 from VMQ 3 of switch 752B1093-0029-4E22-8D90-FDFE839B99C2 (Friendly Name: SET_Team), ndisStatus = -1071448015 .

Failed to move RSS queue 8 from VMQ 3 of switch 752B1093-0029-4E22-8D90-FDFE839B99C2 (Friendly Name: SET_Team), ndisStatus = -1071448015 .

Failed to move RSS queue 9 from VMQ 3 of switch 752B1093-0029-4E22-8D90-FDFE839B99C2 (Friendly Name: SET_Team), ndisStatus = -1071448015 .

And so on.

They always happen on a short time frame, like for 10 minutes, and I haven't noticed any degradation so far, but it worries me. 

I have distributed RSS and VMQ across the system's CPU as recommended, but I've never done it to a Server 2019 Cluster before, so I'm worried I might have missed something.

Here are RSS and VMQ settings for the physical interfaces:



Any thoughts?

Regards,

Giovani

All replies (7)

Tuesday, December 17, 2019 9:05 AM

I have this problem too, same environment and same hardware. Could this be due to a driver issue?


Tuesday, December 17, 2019 1:28 PM

It appears to be related to d.VMMQ, specifically: "When network throughput is low : The system coalesces traffic received on a vmNIC to as few CPUs as possible"

I have disabled it by running the command below on both hosts and haven't seem this message since 11/28. 

Set-VMNetworkAdapter -ManagementOS -VrssQueueSchedulingMode StaticVrss

This command will cause VMMQ to fallback to its previous Server 2016 behavior, keeping the queues static once allocated. 

I also disabled it on the VMs by running

Get-VM | Get-VMNetworkAdapter | Where-Object VrssQueueSchedulingMode -like Dynamic | Set-VMNetworkAdapter -VrssQueueSchedulingMode StaticVrss

Do mind that the VM configuration version needs to be at least 7.1 (I think, I upgraded them to 9.0 anyway) for this command to work.

I was also having an issue where one of my nodes would became unresponsive. I could not LiveMigrate, start or stop VMs that where residing on the affected node and it would eventually BSOD because the cluster service would stop responding. I haven't seen this behavior since I made those changes as well, so I think I'm on the right track here.

EDIT: I just noticed that on another cluster I didn't disable d.VMMQ on the VMs and I still get those errors there, so I guess it needs to be disabled both at the host and VM level.

Regards,

Giovani


Monday, February 17, 2020 10:56 AM

We have the same errors in our S2D cluster logs. All software, and firmwares is up to date. We are using fastlinq ql45212h network adapters. We don't want to disable d.VMMQ. May be anyone know how to troubleshoot this issue? 


Sunday, March 1, 2020 7:54 PM

I have this exact same issue with a nearly identical setup. I forwarded this post to the Microsoft support engineer assigned my case. He requested that I test this solution on our dev cluster. Before doing so, I wanted to ask if the 'Get-VM | Get-VMNetworkAdapter...' cmdlet needs to be run on every node in the cluster? And what about new VMs which are created subsequent to the command being run? 

Any other experience that could be provided would be very much appreciated. 


Tuesday, March 3, 2020 7:36 AM

Yes, it needs to be run on every node and rerun for every new VM. If migrating VMs from older cluster also upgrade their configuration to the latest version.

Both of my clusters have been stable for the last couple of months after running those commands. It might have helped that I also upgraded the Arcserve UDP servers backing up my environment to the latest version as well.

Regards,

Giovani


Tuesday, March 3, 2020 2:17 PM

What solution solved this issue? 

Update VM configuration for migrated VMs?


Tuesday, March 3, 2020 2:21 PM

For me it was upgrade VM configuration and disabling d.VMMQ both at host and VM level running the powershell commands mentioned above.