Issue connecting to primary replica on AAG environment

ice 5 Reputation points
2024-11-11T07:10:03.7666667+00:00

Hello.

I have an always on availability group environment with three replica: SQL01 (as primary),  SQL02 (as secondary), SQL03 (as secondary), using Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64). 

I encounter an issue where i could not connect to SQL01 via SQL Server Management Studio (SSMS), then i did failover to SQL02 as primary and i tried again to connect to SQL01 and it successfully connected to SQL01.

 

Upon the issue encounter, i made sure that:

-i can remote the the three server of three SQL node

-SQL server services on the three SQL are up

-Cluster AAG is up (Failover Cluster Manager)

-i could connect directly to SQL02 and SQL03, but not SQL01

 

Below are detail chronology:

-11:55 AM: encountered error login database from application backend

-12:00 PM: got log 

"A time-out occurred while waiting for buffer latch -- type 2, bp xxx, page xxx, stat xxx, database id: xxx, allocation unit Id: xxx, task xxx : 0, waittime 300 seconds, flags xxx, owning task xxx. Not continuing to wait."

-12:01 PM: got log

"Windows Server Failover Cluster did not receive a process event signal from SQL Server hosting availability group 'AAG' within the lease timeout period."

"Always On Availability Groups connection with secondary database terminated for primary database 'xxx' on the availability replica 'SQL03' with Replica ID: {xxx}. This is an informational message only. No user action is required. "Always On Availability Groups connection with secondary database terminated for primary database 'xxx' on the availability replica 'SQL02' with Replica ID: {xxx}. This is an informational message only. No user action is required."

 

After that, I tried to connect to SQL01 few times but failed (still failed until 30mins after the issue). Then I manually failed over to SQL02 as primary. Then I tried to connect to SQL01 again and succeed.

 

Any idea on why i can't connect to SQL01 at first, but after doing failover to SQL02, i tried to connect to SQL01 again and succeed? Usually when the network disruptions happened between SQL nodes, after the connection is established between these nodes, i can connect to the SQL nodes.

 

Thank you.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,069 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
1,014 questions
0 comments No comments
{count} vote

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.