Clustering Network Issue After Rebuilding Node: Node A is Reachable from Node B by Only One Pair of Interfaces

Article
2015-01-05

Question

_{Monday, January 5, 2015 10:38 PM}

Hey Guys -

I've got an issue I've been working on for a couple of days now and need help with. Our company has a VDI cluster which has a total of 5 nodes. Recently, one went down and was rebuilt. I was told that all settings were configured as they should be and have verified for one that all of the NIC settings (static IPs, options enabled, etc) are correct or match the other hosts.

The problem is that in VMM (2008), the node is still listed as "Needs Attention." When I run a validation on the cluster, I get many network-related issues that appear. Below are examples of the two primary ones.

Note: Node C is the one which was rebuilt...

Error Type #1
Node C is reachable from Node B by only one pair of interfaces. It is possible
that this network path is a single point of failure for communication within the cluster. Please verify that
this single path is highly available or consider adding additional networks to the cluster.

Node D is reachable from Node C by only one pair of interfaces. It is possible
that this network path is a single point of failure for communication within the cluster. Please verify that
this single path is highly available or consider adding additional networks to the cluster.

Node C is reachable from Node D by only one pair of interfaces. It is possible
that this network path is a single point of failure for communication within the cluster. Please verify that
this single path is highly available or consider adding additional networks to the cluster.

Node C is reachable from Node E by only one pair of interfaces. It is possible
that this network path is a single point of failure for communication within the cluster. Please verify that
this single path is highly available or consider adding additional networks to the cluster.

Error Type #2
Network interfaces E - LiveMigration and C - LiveMigration are on the same cluster network, yet either address 10.50.7.23 is not reachable from 10.50.7.25 or the ping latency is greater than the maximum allowed 500 milliseconds.

Network interfaces E - LiveMigration and C - LiveMigration are on the same cluster network, yet either address 10.50.7.23 is not reachable from 10.50.7.25 or the ping latency is greater than the maximum allowed 500 milliseconds.

Network interfaces C - LiveMigration and E - LiveMigration are on the same cluster network, yet either address 10.50.7.25 is not reachable from 10.50.7.23 or the ping latency is greater than the maximum allowed 500 milliseconds.

Network interfaces C - LiveMigration and B - LiveMigration are on the same cluster network, yet either address 10.50.7.22 is not reachable from 10.50.7.23 or the ping latency is greater than the maximum allowed 500 milliseconds.

... and so on... there are a total of 12 of the above error....

If it helps any, I restarted two of the nodes (including the one which was rebuilt) and received an IP Address Conflict message. The error included the MAC of the NIC which was conflicting. I found out which node the MAC was on and looked at it's IPv4 address (IPv6 disabled on all NICs / all nodes) and it didn't match any of the ones from the server that threw it - weird!

Any suggestions as to where to look or what to do? Thanks!

Ben K.

All replies (5)

_{Tuesday, January 6, 2015 5:19 AM}

Hi Ben,

Please provide the details for the below

1) Ip address details of production and heartbeat NIC of node C

2) IP address details of production and heartbeat NIC of node B

Since you have VMM 2008, we are not sure how network is designed. However, please check if node C has been assigned with proper virtual switch and if Vlan tag is correct (if it is used).

I restarted two of the nodes (including the one which was rebuilt) and received an IP Address Conflict message.

Well, if you restart node in a cluster, cluster resources will failover to another node. While it is happening, you may get IP conflict message because, the passive node which receives the resources from active node will try to bring the resources online even before active node releases the IP resource. This leads to IP conflict error. However, we had resolved this issue by updating NIC drivers in our cluster. May be you can try to update NIC drivers on all the nodes and see if it helps.

Thanks,

Umesh.S.K

_{Wednesday, January 7, 2015 2:49 AM}

Hi Ben.K,

Additional, make sure that your connectivity to your storage is on a different subnet, please check your network settings or network performance, If you are using a virtual LAN (VLAN), the one-way communication latency between any pair of cluster nodes on the VLAN must be less than 500 milliseconds.

The related KB:

Quick Start Guide for Server Clusters

http://technet.microsoft.com/en-us/library/cc739757(v=ws.10).aspx

The related thread:

Cluster Validation Errorn- Network Error disjoint network

https://social.msdn.microsoft.com/forums/sqlserver/en-US/33469859-a6ac-4a0d-8ed9-800403a7eacf/cluster-validation-errorn-network-error-disjoint-network

I’m glad to be of help to you!

Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

_{Wednesday, January 7, 2015 3:55 PM}

Good Morning Guys -

Thanks for your replies & suggestions! The good news is that I did get the other node to come back up so that the cluster now has 5 of 5 nodes online and active. The cause of the issue was a few things including the Virtual Network Name on the node didn't match the cluster, the VMM agent hadn't been reinstalled on the rebuilt node, and a couple others.

However - I now have two new issues this brought that I hope you can assist with:

Issue #1 - "Unsupported Cluster Configuration" Status

We currently have about 200-300 VMs spread out amongst the nodes. After I got the rebuilt one back online, about 40 of them (spread out amongst all nodes) changed their status to "Unsupported Cluster Configuration." I cannot find anything that makes these ~40 different via their configurations as all VMs are set to use High Availability. The ones that have this status are still working as the ones which were started before may still be pinged & accessed, but I cannot do anything else with them.

Note: I did find a PowerSHell script which I saw would help identify the issue if run, but it failed as the get-scvmhostcluster and other cmdlets couldn't be found so guess it only works for 2012+ (we run 2008)

Issue #2 - 6 Bad VMs

When bringing the rebuilt node back online, VMM showed that it had 6 VMs which were missing or in a bad state. Some of the names it listed had previously been migrated to other nodes and are alive and working on the other nodes while others do not exist anywhere anymore. How can these be resolved - especially without affecting VMs with the same names which are legit and working on the other nodes?

Thanks Guys - I appreciate your help!

Ben K.

_{Wednesday, January 7, 2015 6:14 PM}

Hi Ben,

Please check the below links which can help you.

http://blogs.technet.com/b/scvmm/archive/2009/08/10/fixing-unsupported-cluster-configuration-status-caused-by-virtual-network-settings.aspx

http://support.risualblogs.com/blog/2014/03/14/vmm-2012-issue-unsupported-cluster-configuration-and-virtual-switch-switch-name-is-not-highly-available-because-the-switch-in-host-hostname-is-not-compatible-with-other/

https://support.microsoft.com/kb/2822797?wa=wsignin1.0

Thanks,

Umesh.S.K

_{Thursday, January 8, 2015 7:46 PM}

Thanks!

I checked the links you suggested - some of which I had already found. I must say that I'm no pro at Clusters as my thing is SCCM, but have done my best.

I've tried to check into them, but many times am not sure if they even apply as they are mostly for 2012 - OR - are for specific messages. Are the messages they reference ones that would appear in the VMM "Jobs" view or another location? If in the Jobs view, then I can rule many out.

When I try to simply refresh one of the VMs which lists the "Unsupported Cluster Configuration" error, I get the below in Jobs. The strange thing is that I've compared every aspect of these VMs to others without the message and don't see any difference - especially in networking. Below is the message I get in Jobs when I try to refresh one of the VMs which has the issue:

Warning (13921)

Warning (13921)
*Highly available virtual machine WIN7-DEVELOP01 is not supported by VMM because one or more of its network adapters is not configured correctly. *

Recommended Action
Ensure that all of the virtual network adapters are either disconnected or connected to highly available virtual networks.

Again, almost 40 VMs of about 250 list this error, a few exist on all nodes, and i can't find any common setting between the ones that have this message and ones that don't.

I'm trying a few more things now and will post results, but if this rings a bell for anyone, please let me know - Thanks!

Ben K.

UPDATE 1

I was examining a few things based off articles I found online when I came across this post.

http://www.yusufozturk.info/virtual-machine-manager/scvmm-2008-r2-solution-of-warning-13921.html

(Sorry for full address but Hyperlink button wouldn't work)

We only have a single Network available for each VM - which all VMs should be set to. The kicker is that when I went to the properties of the cluster in VMM (half way through his steps) and choose the "Networks" tab, nothing is listed!?!

I'd think that at least the one we have would be. Wanted to post as this seems to be related or may provide a hint. Thanks!

Share via

Clustering Network Issue After Rebuilding Node: Node A is Reachable from Node B by Only One Pair of Interfaces

Question

All replies (5)

Additional resources