Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Thursday, November 30, 2017 11:44 AM
I installed SP4 on my active passive cluster in the following manner:
Updated passive node
Attempted to fail over, but failed.
Thinking it was an issue with a mismatch (which has never mattered before), I went ahead and updated the active node.
At that point, neither node would start the SQL Server role. I received error 1069, which was probably the main culprit, and then 1205 and 1254. I cannot find ANYTHING that tells me more (like why specifically I am getting 1069), but all disk resources are online, the registry values look good, IPs are as they should be, etc. I spent a couple of hours on Google and didn't find much other than possible password issues, drives being offline, etc, but I verified all of that. I ran the command the dump the cluster logs using PowerShell, but the only errors I found in there were, according to my research, misfires that occur when you don't use Hyper V for the cluster.
My maintenance window ended and I had to get this thing back into production, so I rolled back SP4 on the passive node. As soon as that completed, I was able to run the SQL Server role on it. I then rolled back on my primary and was able to let it run the cluster as well.
This leads me to believe there is an issue with SP4 itself. I am Windows 2012 Standard R2, latest Windows patches installed.
Any ideas? I installed the same patch on 2 other non-clustered SQL Servers, and I had no issues. Additionally, the cluster itself wasn't totally broken - I also cluster MSTDC and it was able to hope from server to server during all of this with no issue.
Thursday, November 30, 2017 12:52 PM
Have you restarted the servers (take offline and then bring online) after installing SP4?
Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/
MS SQL optimization: MS SQL Development and Optimization
MS SQL Consulting: Large scale of database and data cleansing
Remote DBA Services: Improves MS SQL Database Performance
SQL Server Integration Services: Business Intelligence
Thursday, November 30, 2017 2:03 PM
You might try checking setup bootstrap log for more informative error messages. It's located approximately here:
C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\Log
What version of Windows is your server? Has OS been patched up to current & rebooted recently?
Are they physical servers or VM's?
SQL Server Failover Cluster rolling patch and Service Pack Process
Did you test failover and failback between active and passive nodes successfully, prior to applying SP4?
You've probably already seen these articles:
Event ID 1205
Event ID 1069
HTH,
Phil Streiff, MCDBA, MCITP, MCSA
Thursday, November 30, 2017 2:58 PM
Uri - Yes I did.
Thursday, November 30, 2017 3:10 PM
There are a lot of files in that folder location - which one am I interested in? I see a text file named <SQLCluserName>_Cluster, and it has several lines like this:
mscs::TopologyPersister::TryGetNetworkPrivateProperties: (2)' because of 'OpenSubKey failed.'
They are Windows 2012 R2 Standard. The very latest patches were installed prior to installing SP4 (I do this once a month on all servers), and the server WAS rebooted before and after I installed SP4.
The servers are Physical
I followed the steps in that article, but when I tried to fail the cluster to the passive server that was patched first, that is when I experienced my first failover issue. I thought that perhaps the issue was that this SQL patch did not support having different builds on the same cluster, so I went ahead and patched the active cluster.
I tested the failover indirectly, if you will, prior to the update. Since I was installing Windows Updates before the SP install, I rebooted the active node, which failed the cluster successfully to the passive node. After the (normally) active was online and the (normally) passive was done with it's update installs, I rebooted the passive server to go back to my primary/better server. I had no issues at this time, nor do I have issues when I do this same thing every month. The problems ONLY happened when SP4 was installed. I had my network administrator remove SP4 from the passive node while I researched the issue, and as soon as SP4 came off, he was able to start SQL on that node in the cluster. When I finally gave up and pulled it from the primary, we were able to fail it back over.
Yes, I did see those articles or ones very similar to them. I checked out the timeout settings and increased as appropriate, and regarding resources being offline, all dependencies showed up during this process.
I appreciate the help!!!
Thursday, November 30, 2017 10:42 PM
NET HELPMSG 1069 = The service did not start due to a logon failure. 1205 = Unable to open the network connection profile. 1254 = This operation is not supported on a computer running Windows Server 2003 for Small Business Server.
There seems to be an issue with the service account, and 1205 may indicate why that happens. But if there is some network problem, why would it resolve by rolling back to SP3?
And the message 1254 appears completely bogus. Did you type the wrong number?
I don't exactly what the problem might be, but the error numbers do point in a certain direction, and one test could be to try a different service account.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Thursday, November 30, 2017 11:08 PM
I saw lots of articles about 1609 being due to a login failure, but I reentered the passwords both through the Windows service manager and the SQL Configuration Manager. I also tried using the local system account instead, and it had the same issue. Further, after rolling back the service pack, those accounts worked - I made no further changes after uninstalling that patch.
Yes, 1254 is accurate, but that has to do with the timeout being exceeded. That was hit immediately upon trying to start the service every time - the errors all come through within milliseconds of each other.
Friday, December 1, 2017 10:39 PM
Well, the 1205 error indicated a problem to access the profile. But in that case, I would have expected Local System to work.
Could it be an evil Anti-Virus product?
If there really was a general issue with SP4 in clusters, I guess I would have heard about now, so I'm inclined to assume that there is something in your environment that triggers this. (But it could still be a bug in SQL Server.)
Since this can be difficult to troubleshoot in a forum, I'm inclined to recommend you to open a case, although it can be costly if this it is not considered to be a bug in a Microsoft product.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Friday, December 1, 2017 11:23 PM
Thanks. I did try unloading my AV software to see if that would help at all, but it did not. I am trying to avoid the case for now, hoping someone will pick this up at MS or someone has seen this and found a fix. I'll hold my breath! :-)