Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Friday, June 7, 2013 3:36 AM
Hi,
We encountered this problem (on 25 April 2013) and need to find out the root cause. We do have the cluster.log file but we are not that good to interpret it. Please assist.
Operating system: Windows Server 2008 R2 Enterprise Edition (64-bit)
Error message:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
All replies (9)
Friday, June 7, 2013 6:08 AM
HI,
Was there a network failure and what quorum type is configured ?
Was it a two node cluster ?
If you run the cluster validation what does it say ? failure/warnings
Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you
Sunday, June 9, 2013 9:28 PM
It's been more than a month since you had the error. Has it happened again? Robert has provided common reasons for the issue, but looking for the root cause of an error that occurred over six weeks ago is going to be pretty difficult. And if it has not happened again, what is the concern?
.:|:.:|:. tim
Monday, June 10, 2013 9:11 AM
HI,
Was there a network failure and what quorum type is configured ?
Was it a two node cluster ?
If you run the cluster validation what does it say ? failure/warnings
Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you
Dear Robert,
There was no report on network failure. We are currently running on two node cluster, active-passive.
We did run the cluster validation on 'network' portion only, and we received the following warnings.
Node2 is reachable from Node1 by only one pair of interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available or consider adding additional networks to the cluster.
As far as we understand, the cluster was configured without heartbeat checking as we were told that this feature is supported in Windows Server 2008 R2.
Regards,
Gan
Monday, June 10, 2013 9:15 AM
It's been more than a month since you had the error. Has it happened again? Robert has provided common reasons for the issue, but looking for the root cause of an error that occurred over six weeks ago is going to be pretty difficult. And if it has not happened again, what is the concern?
.:|:.:|:. tim
Dear Tim,
Yes, you are right, the problem only happened once. However, our management demand for an explanation and root cause, mainly because the system is very critical, and we also hope to avoid the same problem if we manage to find out the root cause :)
Thank you.
Regards,
Gan
Tuesday, June 11, 2013 7:30 AM
Hi Gan,
Yes your cluster can work with one network adapter, but in case of a network failure your cluster is down there is no comunication between the cluster nodes anymore so therefor it is down. place a second nic that holds the cluster heartbeat on a different switch or with a direct links on the machines as long as the clusters see and talk to the nodes the cluster is up. ( most cases ).
Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you
Wednesday, June 12, 2013 2:12 AM
Hi Robert,
Probably are you able to advise me what the main reason whenever the quorum is lost? In my case, we do not use private network for heartbeat, the heartbeat is configured through the network switch. So are we able to say that the network switch was likely the one that caused the quorum error? Or, if the domain server is down, will this also cause the quorum error?
Thank you.
Regards,
Gan
Wednesday, June 12, 2013 7:25 AM
HI,
If the DC is gone the cluster will keep running* that is depending on the resources and for how long the dc is gone.
The cluster will have kerberos errors if there is no DC ! You need a DC to run your cluster.
If your switch is down then the cluster can't speak to the other node and therefor it went down. If you can put a direct cable between the two nodes. If the switch is gone the cluster wil use the direct cable to talk. ( you need 2 nic's in both node )
Understanding Quorum in a Failover Cluster
http://blogs.msdn.com/b/clustering/archive/2011/05/27/10169261.aspx
Details of How Quorum Works in a Failover Cluster
http://technet.microsoft.com/en-us/library/cc730649(v=ws.10).aspx
Understanding Quorum Configurations in a Failover Cluster
http://technet.microsoft.com/en-us/library/cc731739.aspx
Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you
Friday, June 14, 2013 8:22 AM
Hi Robert,
Thanks so much for the clarifications.
If possible, we still hope that we can get some definite answers from Microsoft if we can provide them the cluster logs and server logs. Basically we have the logs, but are not sure of some of the high-level terms stated there. Can you please advise how we can contact Microsoft technical support?
Regards,
Gan
Saturday, June 15, 2013 12:07 PM
Hi,
to contact microsoft check this website http://technet.microsoft.com/en-us/library/dd346877.aspx
Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you