SCOM Management server recovery

Seyedmajid Taheri 51 Reputation points
2025-10-02T10:31:19.8266667+00:00

Hi guys.

In my test environment, I have SCOM 2019 UR6 consisting of three Management Servers, four Gateway Servers, one server for the Data Warehouse database, and one server for the Operational database.

Yesterday, I attempted to perform an in-place upgrade to SCOM 2022. I followed the required pre-upgrade steps according to Microsoft’s documentation and Kevin Holman’s blog.

When I tried to upgrade the first Management Server, the wizard failed at the "Configure Operational Database" step, and then the Management Server was automatically removed from the system. After that, the other two Management Servers also went down.

To recover the environment, I first restored both the Operational Database and the Data Warehouse Database to their pre-upgrade state. Then, I recovered the first failed Management Server using the /Recover command, and I was able to reconnect the console.

Afterward, I re-entered the password for the Management Server Action Account in the console. However, in the Event Viewer of all Management Servers, I am still seeing the following event:

could you guys please help me to resolve the issue?

thank you

OpsMgr has no configuration for management group SCOMMGTEST and is requesting new configuration from the Configuration Service.
OpsMgr Management Configuration Service failed to process configuration request (Xml configuration file or management pack request) due to the following exception
Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.HealthServicePublicKeyNotRegisteredException: Padding is invalid and cannot be removed.
Server stack trace: 
       at Microsoft.EnterpriseManagement.RuntimeService.RootConnectorMethods.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.RuntimeService.SDKReceiver.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)
       at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs)
       at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg)
    
    Exception rethrown at [0]: 
       at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
       at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
       at Microsoft.EnterpriseManagement.Mom.Internal.ISdkService.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Communication.CredentialDataProvider.GetSecureDataUnwrapped(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Communication.CredentialDataProvider.GetSecureData(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.TracingCredentialDataProvider.GetSecureData(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.WriteSecureData(AgentConfigurationStream stream, XmlWriter writer, Guid agentId, Hashtable credentialAssociationList, Hashtable credentialList)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.WriteSnapshotState(AgentConfigurationStream stream, XmlWriter writer, AgentValidatedConfiguration validatedConfig)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.GetSnapshotConfigurationStream(AgentValidatedConfiguration validatedConfig, AgentConfigurationCookie oldCookie, AgentConfigurationCookie& newCookie)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationBuilder.FormatConfig(ConfigurationRequestDescriptor requestDescriptor, IAgentConfiguration agentConfig)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentRequestProcessor.ProcessConfigurationRequest(ICollection`1 requestList, Int32& processedRequestsCount)
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentRequestProcessor.Execute()
       at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.ThreadManager.ResponseThreadStart(Object state)
System Center Operations Manager
System Center Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. SChalakov 10,666 Reputation points MVP Volunteer Moderator
    2025-12-10T13:58:18.1433333+00:00

    Hi,

    were you able to resolve the issue? Can you please provide us with an update? If not I will try to help you out!

    Regards

    Stoyan


  2. SChalakov 10,666 Reputation points MVP Volunteer Moderator
    2025-12-11T10:13:24.35+00:00

    Hi,

     based on the symptoms and your recovery steps, what you are seeing now is no longer the original upgrade problem but a configuration/crypto issue in the recovered SCOM 2019 management group.

    The exception

    HealthServicePublicKeyNotRegisteredException: Padding is invalid and cannot be removed
    

     is thrown when the Management Configuration Service tries to build configuration that contains secure data (Run As credentials, action accounts, etc.) but cannot decrypt that data with the current encryption keys. This is a known pattern after a restore/recovery when not all Run As / service account credentials or keys are in a consistent state.

    Because the exact cause is not obvious from a single event, you will need some generic health checks first, then a focused pass on Run As accounts and secure storage. Below is a step-by-step plan; several steps only collect more information.

    1. Baseline – confirm the recovered state of the management group

    1. In the console, check Help > About and verify that:
      • All management servers report version SCOM 2019 UR6, not 2022.
      1. On the SQL server:
        • Confirm that the OperationsManager and OperationsManagerDW databases are both restored from backups taken immediately before you started the 2022 upgrade (same point in time).
        1. On each management server:
          • In “Apps & Features” / “Programs and Features”, verify you only see SCOM 2019 components, no leftover SCOM 2022 setup.

    This ensures you truly have a clean 2019 UR6 environment to stabilize.

    2. Check core services and All Management Servers Resource Pool

    On each management server, make sure these services are:

    • Startup type = Automatic
    • Status = Running
    • System Center Data Access Service (OMSDK)
    • System Center Management Configuration (cshost)
    • System Center Management (HealthService)

    Then in the console:

    • Go to Monitoring > Operations Manager > Management Server State and All Management Servers Resource Pool.
    • Note whether the management servers are Healthy or Grey/Unhealthy, and whether there are any alerts like “All Management Servers Resource Pool Unavailable”.

    If the pool is completely down, everything else will be affected, so this is the first thing to get back to green.

     

    1. Targeted event log review on all management servers

    On each management server:

    1. Open Event Viewer > Applications and Services Logs > Operations Manager.
    2. Create a custom view or filter for:
      • Time range: last 1–2 hours
      • Event IDs: 21023, 29120, 20070, 20071, 21016, 26319, 31551–31553, 1102, 1103
    3. Check:
      • Are 21023 (“has no configuration…”) and 29120 (HealthServicePublicKeyNotRegisteredException) the main repeating errors?
      • Or do you also see:
      • SQL / login / timeout errors (31551–31553, 26319),
      • Run As logon failures,
      • or agent-connectivity issues (20070, 21016)?

    Make a note of the exact set of repeating event IDs – this will show whether we are dealing with a pure secure-data problem or a broader connectivity issue.

     

    4. Re-enter all Run As and service account passwords

    After a DB restore or management server recovery, it is mandatory to re-enter the passwords for every Run As account, not just the Management Server Action Account. Missing or stale Run As credentials after recovery are a documented cause of this exact error

     In the SCOM console:

    1. Go to Administration > Run As Configuration > Accounts.
    2. For each relevant account type:
      • Action Account (Management Server Action Account)
        • Data Warehouse Write Account
          • Data Warehouse Read Account
            • Any additional Windows Run As accounts (SQL monitoring, network devices, Unix/Linux, custom MPs, connectors, etc.)

    do the following:

    5.            Open Properties.

    6.            On the credentials page, re-enter the password (even if you are sure it is correct).

    7.            Finish the wizard without changing distribution or scope.

    1. Still under Run As Configuration, open Profiles and check the most important ones (e.g. Data Warehouse Account, SDK and Config Service Account, application packs):
      • Verify that the expected Run As account is still associated with each profile and scope.
        1. On each management server, restart in this order:
        • System Center Data Access Service
          • System Center Management Configuration
            • System Center Management

    Then watch the Operations Manager log for 10–15 minutes:

    • If the problem was only missing Run As credentials, the 29120 “Padding is invalid…” events should stop after configuration is rebuilt.

     

    5. Flush the Health Service cache on the management servers

    If 21023 / 29120 continues, force a clean configuration snapshot on each management server.

    On each management server:

    1. Stop the System Center Management service.
    2. Delete the Health Service cache folder (default path):
    3. %ProgramFiles%\Microsoft System Center\Operations Manager\Server\Health Service State
    4. Start the System Center Management service again.

    Clearing this folder is the supported way to flush the Health Service cache; it is a standard step when fixing broken agents and applies to management servers as well.

     After this:

    • Monitor the Operations Manager log.
    • You should see events that the Health Service is requesting and loading configuration and management packs.
    • Check whether 21023 and 29120 still reappear.

     

    6. Verify the secure-storage / encryption situation

    Decryption of Run As credentials relies on an encryption key that was created when the first management server in the management group was installed. That key is stored in the registry and normally copied to all other management servers; it is then used to decrypt the secure data stored in the database. If that key is lost or changed (for example due to a rebuild that didn’t preserve the registry), SCOM can no longer decrypt Run As credentials and you get exactly this kind of exception.

    Given your sequence (failed upgrade → DB restore → /Recover on one MS):

    • Think about whether any of the management servers – especially the original first MS in the MG – were rebuilt from scratch or restored from snapshots after the DB restore.
    • If you have an older backup or snapshot of a previously healthy management server (ideally the first one installed in the MG), you can:
    • Compare its SCOM secure-storage–related registry keys with the current servers, and
      • If they differ, follow the guidance from Kevin Holman’s article “Recovering a SCOM management server” to restore the original encryption key. 

    If the OS itself was never rebuilt and only an in-place SCOM upgrade was attempted, the key is probably still intact and the more likely issue is still in Run As credentials or cache, so focus on steps 4–5 first.

     

    7. Sanity check: database connectivity and accounts

    In parallel, validate that there are no basic SQL issues:

    1. On management servers:
      • Look for events 31551, 31552, 31553, 26319 in the Operations Manager log.
        • These would indicate that SCOM cannot query or write to the Operational or Data Warehouse databases.
        1. On the SQL server:
          • Check the SQL Server error log for failed logins of SCOM service accounts.
            • Make sure the SCOM SDK/Config service account and DW accounts still have the correct database roles (as documented for SCOM 2019 – OpsDB rights, DW data reader/writer, etc.).

    If you see SQL connectivity errors, resolve those first – the configuration service cannot build valid config if it cannot read or write the OpsMgr DB.

     

    8. Only after 2019 is stable – prepare a new 2019 → 2022 upgrade attempt

    Once:

    • All management servers are Healthy,
    • There are no more repeating 21023 / 29120 errors, and
    • The console behaves normally,

    then you can treat the environment as a healthy 2019 UR6 management group again and start to re-prepare the upgrade.

    At that point, make sure you:

    • Run the official scripts / procedures to detect and remediate duplicate management pack aliases, which are a known cause of 2019→2022 upgrade failures at the “Configure Operational Database” step.
    • Re-check the 2019→2022 checklist you used initially to ensure nothing was missed.

     

    If you post back (in the forum thread) the outcome of steps 3–5 – especially which event IDs remain after re-entering all Run As credentials and clearing the Health Service cache – it will be much easier to decide whether you “only” had stale credentials, or whether the secure-storage key itself is out of sync and needs deeper remediation.

     

    Stoyan Chalakov

    "If my response was useful, please consider marking it as the answer. It keeps the forum clean, structured, and more helpful for everyone. Thank you for supporting the community."

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.