Cluster disk resource is taking 15mins to come online after failover

Apurva Pathak 635 Reputation points
2024-11-04T14:33:53.1766667+00:00

Hi folks,

We've a two-node cluster setup. We have configured a role which has three cluster disks (1 for quorum and other for data). Now, we have observed that whenever we failover from Node A to Node B (Node A --> Node B), it works fine. But, whenever we try to failback from Node B to Node A (Node B --> Node A), one of the cluster disks (Cluster Disk 1 in snip below), takes more than 15 minutes to come online.

We ran cluster validation (at least 6 times, and it shows no issues), there are no network issues between the hosts because both are in same subnet and OS firewalls are turned off, we tried checking I\Ops of the disks and all seems normal (though disk level issue should impact both of the nodes not just only one).

We tried analyzing cluster logs, but we noticed an unusual behavior that, cluster services didn't write anything in the logs until the disk come up (i.e. we don't see any logs for ~15 mins), neither do we see any event in Events Log until that.

PFB snips

{F4805BF1-08D2-4D2C-93CD-0C256ACA3551}

{23A293D7-A114-42F9-8361-D1452E3AF77C}

Any help in helping us investigation this would be highly appreciated!

Thanks in advance!

Windows Server 2019
Windows Server 2019
A Microsoft server operating system that supports enterprise-level management updated to data storage.
3,810 questions
Windows Server
Windows Server
A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.
13,305 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
1,014 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Ian Xue 38,046 Reputation points Microsoft Vendor
    2024-11-06T04:06:45.6666667+00:00

    Hi Apurva Pathak,

    Thanks for your post. It looks like there is some performance issue in the infrastructure of the problem node. Performance in virtual infrastructure is generally analyzed like this:

    1. RAM - if your virtual machine has insufficient RAM, or if you are forcing your host into over-provisioning RAM, that's going to be your primary cause of performance issues.
    2. Disk Performance. Latency and throughput are the key indicators here... If throughput is constantly nearing the physical throughput of disk access chokepoint, performance will be negatively affected. If read latency is averaging over 20ms or peaking over 100ms, or if write latency is averaging over 10ms or peaking over 25ms, you're going to see performance issues.
    3. CPU and Network are tied distantly for third place. Modern CPU and Network are so much more capable than most infrastructures need that this is rarely an issue.

    Best Regards,

    Ian Xue


    If the Answer is helpful, please click "Accept Answer" and upvote it.


  2. Alex Bykovskyi 2,166 Reputation points
    2024-11-06T15:01:31.0966667+00:00

    Hey,

    You should check that your storage is connected properly on both nodes. In addition, as mentioned, storage performance should be checked. Might be helpful: https://community.spiceworks.com/t/slow-failover-in-windows-2016-file-server-cluster/564284

    In addition, you should check that your nodes and cluster have DNS records on your DNS server.
    https://learn.microsoft.com/en-us/answers/questions/306283/failover-cluster-is-adding-dns-a-record-for-cluste

    As another option, you can test storage failover with another storage. StarWind VSAN can be used for that. https://www.starwindsoftware.com/starwind-virtual-san

    Cheers,

    Alex Bykovskyi

    StarWind Software

    Note: Posts are provided “AS IS” without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.