Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Tuesday, May 12, 2015 10:21 AM
Hi
I have a Windows 2008R2 4 node cluster. The cluster is configured like so:
2 x nodes in Primary DC (1 is the active node)
2 x nodes in Secondary DC
1 x file share witness in third site
We had an issue last night whereby the 2 nodes in the secondary DC lost network communication due to a network event. The logs stated:
File share witness resource 'File Share Witness' failed a periodic health check on file share '\fsw-01\Clus01'. Please ensure that file share '\fsw-01\Clus01' exists and is accessible by the cluster.
The net effect was that the entire cluster stopped:
Cluster service was halted due to incomplete connectivity with other cluster nodes.
And:
Cluster node 'DC2-SQL1' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
Why has the entire cluster failed due to a networking issue that only affected 2 of the secondary nodes in the secondary site? The primary site nodes could still see the FSW.
Any insight would be great!
Thanks!
All replies (4)
Wednesday, May 13, 2015 9:00 AM
Hi Sjmry1,
As the error information tips, can you run the cluster validation then post the error and warning information, if you have any AV soft installed on this nodes please first disable or uninstall then monitor the issue again.
I’m glad to be of help to you!
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]
Wednesday, May 13, 2015 1:00 PM
Hi
I ran the cluster validation tool and the result was as follows:
So as we do not use cluster disks that section was showing as warning. One other issue was a 20 mins TTL on the cluster hostname DNS record. That has been modified but would not have contributed to the problem in the first place.
The final thing in the report is:
These are VM's with one NIC interface hence the warning above. We use highly available networking.
Any thoughts?
Thanks
Thursday, November 15, 2018 12:34 PM
Did you lost FSW comunication during this incident? This is the expected behavior of quorum in Windows 2008 because it seems you lost 2 nodes and FSW what means that you got only 2 votes of 3 required (majority of 5).
Thursday, November 15, 2018 12:35 PM
Did you lost FSW comunication during this incident? This is the expected behavior of quorum in Windows 2008 because it seems you lost 2 nodes and FSW what means that you got only 2 votes of 3 required (majority of 5).
In time, there is a hotfix if the FSW was online: https://support.microsoft.com/en-us/help/978790/the-file-share-witness-resource-is-in-a-failed-state-even-though-the-f