SCOM Management server recovery

Question

SCOM Management server recovery

Seyedmajid Taheri 51

Hi guys.

In my test environment, I have SCOM 2019 UR6 consisting of three Management Servers, four Gateway Servers, one server for the Data Warehouse database, and one server for the Operational database.

Yesterday, I attempted to perform an in-place upgrade to SCOM 2022. I followed the required pre-upgrade steps according to Microsoft’s documentation and Kevin Holman’s blog.

When I tried to upgrade the first Management Server, the wizard failed at the "Configure Operational Database" step, and then the Management Server was automatically removed from the system. After that, the other two Management Servers also went down.

To recover the environment, I first restored both the Operational Database and the Data Warehouse Database to their pre-upgrade state. Then, I recovered the first failed Management Server using the /Recover command, and I was able to reconnect the console.

Afterward, I re-entered the password for the Management Server Action Account in the console. However, in the Event Viewer of all Management Servers, I am still seeing the following event:

could you guys please help me to resolve the issue?

thank you

OpsMgr has no configuration for management group SCOMMGTEST and is requesting new configuration from the Configuration Service.

OpsMgr Management Configuration Service failed to process configuration request (Xml configuration file or management pack request) due to the following exception
Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.HealthServicePublicKeyNotRegisteredException: Padding is invalid and cannot be removed.
Server stack trace: 
       at Microsoft.EnterpriseManagement.RuntimeService.RootConnectorMethods.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.RuntimeService.SDKReceiver.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)
       at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs)
       at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg)
    
    Exception rethrown at [0]: 
       at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
       at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
       at Microsoft.EnterpriseManagement.Mom.Internal.ISdkService.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Communication.CredentialDataProvider.GetSecureDataUnwrapped(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Communication.CredentialDataProvider.GetSecureData(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.TracingCredentialDataProvider.GetSecureData(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.WriteSecureData(AgentConfigurationStream stream, XmlWriter writer, Guid agentId, Hashtable credentialAssociationList, Hashtable credentialList)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.WriteSnapshotState(AgentConfigurationStream stream, XmlWriter writer, AgentValidatedConfiguration validatedConfig)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.GetSnapshotConfigurationStream(AgentValidatedConfiguration validatedConfig, AgentConfigurationCookie oldCookie, AgentConfigurationCookie& newCookie)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationBuilder.FormatConfig(ConfigurationRequestDescriptor requestDescriptor, IAgentConfiguration agentConfig)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentRequestProcessor.ProcessConfigurationRequest(ICollection`1 requestList, Int32& processedRequestsCount)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentRequestProcessor.Execute()
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.ThreadManager.ResponseThreadStart(Object state)

2 answers

Your answer

Answer 1

SChalakov 10,666 MVP Volunteer Moderator

Hi,

were you able to resolve the issue? Can you please provide us with an update? If not I will try to help you out!

Regards

Stoyan

Seyedmajid Taheri 51 Reputation points

2025-12-10T14:06:52.1266667+00:00

Hi, Thanks for your reply. Unfortunately, I still haven't been able to fix. Thanks in advance for your help.

Answer 2

Hi,

based on the symptoms and your recovery steps, what you are seeing now is no longer the original upgrade problem but a configuration/crypto issue in the recovered SCOM 2019 management group.

The exception

HealthServicePublicKeyNotRegisteredException: Padding is invalid and cannot be removed

is thrown when the Management Configuration Service tries to build configuration that contains secure data (Run As credentials, action accounts, etc.) but cannot decrypt that data with the current encryption keys. This is a known pattern after a restore/recovery when not all Run As / service account credentials or keys are in a consistent state.

Because the exact cause is not obvious from a single event, you will need some generic health checks first, then a focused pass on Run As accounts and secure storage. Below is a step-by-step plan; several steps only collect more information.

1. Baseline – confirm the recovered state of the management group

In the console, check Help > About and verify that:
- All management servers report version SCOM 2019 UR6, not 2022.
1. On the SQL server:
  - Confirm that the OperationsManager and OperationsManagerDW databases are both restored from backups taken immediately before you started the 2022 upgrade (same point in time).
  1. On each management server:
    - In “Apps & Features” / “Programs and Features”, verify you only see SCOM 2019 components, no leftover SCOM 2022 setup.

This ensures you truly have a clean 2019 UR6 environment to stabilize.

2. Check core services and All Management Servers Resource Pool

On each management server, make sure these services are:

Startup type = Automatic
Status = Running
System Center Data Access Service (OMSDK)
System Center Management Configuration (cshost)
System Center Management (HealthService)

Then in the console:

Go to Monitoring > Operations Manager > Management Server State and All Management Servers Resource Pool.
Note whether the management servers are Healthy or Grey/Unhealthy, and whether there are any alerts like “All Management Servers Resource Pool Unavailable”.

If the pool is completely down, everything else will be affected, so this is the first thing to get back to green.

Targeted event log review on all management servers

On each management server:

Open Event Viewer > Applications and Services Logs > Operations Manager.
Create a custom view or filter for:
- Time range: last 1–2 hours
- Event IDs: 21023, 29120, 20070, 20071, 21016, 26319, 31551–31553, 1102, 1103
Check:
- Are 21023 (“has no configuration…”) and 29120 (HealthServicePublicKeyNotRegisteredException) the main repeating errors?
- Or do you also see:
- SQL / login / timeout errors (31551–31553, 26319),
- Run As logon failures,
- or agent-connectivity issues (20070, 21016)?

Make a note of the exact set of repeating event IDs – this will show whether we are dealing with a pure secure-data problem or a broader connectivity issue.

4. Re-enter all Run As and service account passwords

After a DB restore or management server recovery, it is mandatory to re-enter the passwords for every Run As account, not just the Management Server Action Account. Missing or stale Run As credentials after recovery are a documented cause of this exact error

In the SCOM console:

Go to Administration > Run As Configuration > Accounts.
For each relevant account type:
- Action Account (Management Server Action Account)
  - Data Warehouse Write Account
    - Data Warehouse Read Account
      - Any additional Windows Run As accounts (SQL monitoring, network devices, Unix/Linux, custom MPs, connectors, etc.)

do the following:

5. Open Properties.

6. On the credentials page, re-enter the password (even if you are sure it is correct).

7. Finish the wizard without changing distribution or scope.

Still under Run As Configuration, open Profiles and check the most important ones (e.g. Data Warehouse Account, SDK and Config Service Account, application packs):
- Verify that the expected Run As account is still associated with each profile and scope.
  1. On each management server, restart in this order:
  - System Center Data Access Service
    - System Center Management Configuration
      - System Center Management

Then watch the Operations Manager log for 10–15 minutes:

If the problem was only missing Run As credentials, the 29120 “Padding is invalid…” events should stop after configuration is rebuilt.

5. Flush the Health Service cache on the management servers

If 21023 / 29120 continues, force a clean configuration snapshot on each management server.

On each management server:

Stop the System Center Management service.
Delete the Health Service cache folder (default path):
%ProgramFiles%\Microsoft System Center\Operations Manager\Server\Health Service State
Start the System Center Management service again.

Clearing this folder is the supported way to flush the Health Service cache; it is a standard step when fixing broken agents and applies to management servers as well.

After this:

Monitor the Operations Manager log.
You should see events that the Health Service is requesting and loading configuration and management packs.
Check whether 21023 and 29120 still reappear.

6. Verify the secure-storage / encryption situation

Decryption of Run As credentials relies on an encryption key that was created when the first management server in the management group was installed. That key is stored in the registry and normally copied to all other management servers; it is then used to decrypt the secure data stored in the database. If that key is lost or changed (for example due to a rebuild that didn’t preserve the registry), SCOM can no longer decrypt Run As credentials and you get exactly this kind of exception.

Given your sequence (failed upgrade → DB restore → /Recover on one MS):

Think about whether any of the management servers – especially the original first MS in the MG – were rebuilt from scratch or restored from snapshots after the DB restore.
If you have an older backup or snapshot of a previously healthy management server (ideally the first one installed in the MG), you can:
Compare its SCOM secure-storage–related registry keys with the current servers, and
- If they differ, follow the guidance from Kevin Holman’s article “Recovering a SCOM management server” to restore the original encryption key.

If the OS itself was never rebuilt and only an in-place SCOM upgrade was attempted, the key is probably still intact and the more likely issue is still in Run As credentials or cache, so focus on steps 4–5 first.

7. Sanity check: database connectivity and accounts

In parallel, validate that there are no basic SQL issues:

On management servers:
- Look for events 31551, 31552, 31553, 26319 in the Operations Manager log.
  - These would indicate that SCOM cannot query or write to the Operational or Data Warehouse databases.
  1. On the SQL server:
    - Check the SQL Server error log for failed logins of SCOM service accounts.
      - Make sure the SCOM SDK/Config service account and DW accounts still have the correct database roles (as documented for SCOM 2019 – OpsDB rights, DW data reader/writer, etc.).

If you see SQL connectivity errors, resolve those first – the configuration service cannot build valid config if it cannot read or write the OpsMgr DB.

8. Only after 2019 is stable – prepare a new 2019 → 2022 upgrade attempt

Once:

All management servers are Healthy,
There are no more repeating 21023 / 29120 errors, and
The console behaves normally,

then you can treat the environment as a healthy 2019 UR6 management group again and start to re-prepare the upgrade.

At that point, make sure you:

Run the official scripts / procedures to detect and remediate duplicate management pack aliases, which are a known cause of 2019→2022 upgrade failures at the “Configure Operational Database” step.
Re-check the 2019→2022 checklist you used initially to ensure nothing was missed.

If you post back (in the forum thread) the outcome of steps 3–5 – especially which event IDs remain after re-entering all Run As credentials and clearing the Health Service cache – it will be much easier to decide whether you “only” had stale credentials, or whether the secure-storage key itself is out of sync and needs deeper remediation.

Stoyan Chalakov

"If my response was useful, please consider marking it as the answer. It keeps the forum clean, structured, and more helpful for everyone. Thank you for supporting the community."

Share via

SCOM Management server recovery

2 answers

Your answer