On the same cluster network, yet address 10.67.13.31 is not reachable from 10.67.13.32

Article
2015-05-11

Question

_{Monday, May 11, 2015 2:12 PM}

Hello,

This is the 2nd time this damn issue show up, first time i was forced to evict four out of five nodes and restart them all to fix this issue. and join them again to the cluster >> this issue effect the migration between servers and if i evicted node or tried to join one it will fail till i restart them all.

- All Servers Are reachable "Production and Heartbeat" tested with Ping command and Telnet

- Firewall Disabled and No Antivirus Installed on all Nodes

- All Servers Are up to date with the latest Windows Patches "2012 R2" also network and storage drivers.

Thanks

All replies (11)

_{Wednesday, May 20, 2015 3:53 PM ✅Answered | 1 vote}

Hi,

The Cluster network is on the top and this is fine.

But your drivers are Old the current version is 10.2.478.1 Always use the latest driver if you have troubles!!

Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you

_{Wednesday, May 27, 2015 3:24 AM ✅Answered}

Restarting the Physical Host seems to be fixed the problem for now. anyway i updated the network driver and i will monitor the network behavior and feed back if the problem occur again .. thanks for the support guys

_{Wednesday, May 13, 2015 7:18 AM}

Hi Sn0w_MOnkEY,

You are using incorrect internal network settings, please refer the following article to determine which adapter you are preparing for heartbeat and disable all the nodes firewall then run the validation again.

Recommended private "Heartbeat" configuration on a cluster server

https://support.microsoft.com/en-us/kb/258750

Configuring Windows Failover Cluster Networks

http://blogs.technet.com/b/askcore/archive/2014/02/20/configuring-windows-failover-cluster-networks.aspx

I’m glad to be of help to you!

Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

_{Wednesday, May 13, 2015 2:42 PM}

Hi Sn0w_MOnkEY,

You are using incorrect internal network settings, please refer the following article to determine which adapter you are preparing for heartbeat and disable all the nodes firewall then run the validation again.

Recommended private "Heartbeat" configuration on a cluster server

https://support.microsoft.com/en-us/kb/258750

Configuring Windows Failover Cluster Networks

http://blogs.technet.com/b/askcore/archive/2014/02/20/configuring-windows-failover-cluster-networks.aspx

I’m glad to be of help to you!

Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

Hi Alex,

All five physical hosts was working perfectly and the validation test all successful and reachable.. this issue raise randomly with time. this is the 2nd time > to fix it i must evict all nodes and restart the failed node and rejoin them all again. "this is not a permanent solution"

The first link you provided recommend using heartbeat network which i configured and the problem in the production network anyway not the HB one.

The 2nd link recommend all network configuration already placed :-

- All five hosts are using the same VLAN

- All five hosts are reachable through the production network and the heartbeat network

- All five hosts NIC Name the same and identical all three (1#External Switch, 2#Production, 3#Heartbeat)

- All five hosts network binding order are the same .. (1#External Switch, 2#Production, 3#Heartbeat)

- All five hosts updated with the latest windows patches and drivers

- All five hosts firewall disabled and no antivirus installed

_{Thursday, May 14, 2015 2:42 PM}

Sounds like it might be a hardware issue that is wedging the system. Are you using the same NIC in all servers? Same firmware on all NICs? Is it the same server that initially fails?

. : | : . : | : . tim

_{Saturday, May 16, 2015 11:41 PM}

Sounds like it might be a hardware issue that is wedging the system. Are you using the same NIC in all servers? Same firmware on all NICs? Is it the same server that initially fails?

. : | : . : | : . tim

Hi Tim,

No i am not using the same NIC for all servers but i don't think this is mandatory as a requirement, what matter is the same binding and maybe not sure the same name.

regarding the firmware yes all the same version "fresh new system 2012 R2"

Also no it's not the same server .. last time it was node 4 .. this time it's node 3 :(

_{Monday, May 18, 2015 2:22 PM}

"No i am not using the same NIC for all servers but i don't think this is mandatory as a requirement"

You are correct. It is not a mandatory requirement. Most of the time different NICs should perform just fine. But any time there are differences in the implementation of a protocol that is required to be used (such as two vendors' drivers), the possibility of problems creeps in.

But maybe you misinterpreted my question. When I was asking about the 'same NIC', I was asking about NIC vendor/model. You do state that you have the same firmware, so the only way that is possible is if you are using the same NIC vendor/model for all your connections.

And, though the "last time it was node 4 .. this time it's node 3" points to different nodes, the real question is if it is the same NIC that is involved each time. Just because it is a different node that is reporting does not mean that it is a different NIC causing the problem, because in any network traffic, there are always two NICs involved.

. : | : . : | : . tim

_{Tuesday, May 19, 2015 5:06 PM}

Hi,

I don't understand why evict the node ?

Did you see this issue ? Virtual machines lose network connectivity when you use Broadcom NetXtreme 1-gigabit network adapters

https://support.microsoft.com/en-us/kb/2986895?wa=wsignin1.0

And if the network fails on hyper-v 3 can you still use the vswitch just to make sure it is the cluster and not a vswitch problem or below the vswitch. network drivers the same version ?

_{Wednesday, May 20, 2015 11:46 AM}

"No i am not using the same NIC for all servers but i don't think this is mandatory as a requirement"

You are correct. It is not a mandatory requirement. Most of the time different NICs should perform just fine. But any time there are differences in the implementation of a protocol that is required to be used (such as two vendors' drivers), the possibility of problems creeps in.

But maybe you misinterpreted my question. When I was asking about the 'same NIC', I was asking about NIC vendor/model. You do state that you have the same firmware, so the only way that is possible is if you are using the same NIC vendor/model for all your connections.

And, though the "last time it was node 4 .. this time it's node 3" points to different nodes, the real question is if it is the same NIC that is involved each time. Just because it is a different node that is reporting does not mean that it is a different NIC causing the problem, because in any network traffic, there are always two NICs involved.

. : | : . : | : . tim

Hi Tim,

Yes i am using the same vendor.. this is my NIC Firmware and version

But i am not using the same NIC interface order .. meaning .. some other machines are using Adapter #1 on the heartbeat instead of the Management Network >> but i don't think this could be a problem.

Anyway i notice the driver version is kind old, i will download the latest version and feedback

_{Wednesday, May 20, 2015 11:52 AM}

Hi,

I don't understand why evict the node ?

Did you see this issue ? Virtual machines lose network connectivity when you use Broadcom NetXtreme 1-gigabit network adapters

https://support.microsoft.com/en-us/kb/2986895?wa=wsignin1.0

And if the network fails on hyper-v 3 can you still use the vswitch just to make sure it is the cluster and not a vswitch problem or below the vswitch. network drivers the same version ?

Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you

Hello Robert,

Once this issue show up i am unable to migrate any VM from or to this node. also normal restart don't fix the issue. i was giving up to fix it so i decided to evict all nodes to create new cluster >> after removing all nodes and restart the failed one alone issue resolved.

regarding the connectivity yes i am still able to use the vswitch normally and all VM's are reachable. also i am using the same version yes.

_{Thursday, May 21, 2015 6:38 AM}

Hi,

The Cluster network is on the top and this is fine.

But your drivers are Old the current version is 10.2.478.1 Always use the latest driver if you have troubles!!

Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you

I will update the driver tomorrow morning and feedback ASAP :).

Share via

On the same cluster network, yet address 10.67.13.31 is not reachable from 10.67.13.32

Question

All replies (11)

Additional resources