Edit

Troubleshoot solution updates for Azure Local

Applies to: Hyperconverged deployments of Azure Local

This article describes how to troubleshoot solution updates that you apply to your Azure Local instance to keep it up-to-date.

About troubleshooting updates

If you deploy your system by using a new Azure Local installation, you install the orchestrator as part of that deployment. The orchestrator manages platform updates, including the OS, drivers and firmware, and agents and services.

The new update solution includes retry and remediation logic. This logic attempts to fix update problems in a nondisruptive way, such as retrying a Cluster-Aware Update (CAU) run. If an update run can't be remediated automatically, it fails. When an update fails, inspect the details for the failure message to determine the appropriate next action. You can attempt to resume the update, if appropriate, to determine if a retry resolves the problem.

Troubleshoot readiness checks

Readiness checks are essential to ensure that you apply updates smoothly, keep your systems up-to-date, and maintain correct system functionality. Readiness checks are performed and reported separately in two scenarios:

  • System health checks that run once every 24 hours.

  • Update readiness checks that run after downloading the update content and before beginning the installation.

It is common for the results of system health checks and update readiness checks to differ. This happens because update readiness checks use the latest validation logic from the solution update to be installed, while system health checks always use validation logic from the installed version.

Both system and pre-update readiness checks perform similar validations and categorize three types of readiness checks: Critical, Warning, and Informational.

  • Critical: Readiness checks that prevent you from applying the update. This status indicates issues that you must resolve before proceeding with the update.
  • Warning: Readiness checks that also prevent you from applying the update, but you can bypass these. This status indicates potential issues that might not be severe enough to stop the update but should be addressed to ensure a smooth update process.
  • Informational: Readiness checks that don't block the update. This status provides information about the system's state and any potential issues that shouldn't affect the update process directly. These checks are for your awareness and might not require immediate action.

The troubleshooting steps vary depending on the readiness check scenario.

Using Azure portal

Using Azure portal, scenario 1: System health checks

This scenario occurs when preparing to install system updates in Azure Update Manager:

  1. In the system list, select the Update readiness status. Systems that require attention show a Critical or Warning state.

    Screenshot of Update Manager page.

  2. Review the list of readiness checks and their results.

    1. Select the links under Details to view more information.

    2. When the details box opens, you can view more details, individual system results, and the Remediation for failing health checks.

      Screenshot of Readiness checks with multiple critical errors, highlighting a storage health check failure and its detailed error message with remediation guidance.

    3. Follow the remediation steps provided to resolve any failures.

    4. After resolving the issues, select Check again to rerun the readiness checks.

      Screenshot of Readiness checks showing the Check again button.

Using the Azure portal, scenario 2: Update readiness checks

This scenario occurs when you install and track system updates in Azure Update Manager:

  1. In History, select the failed update run from the list.

  2. On the Validate that the system is ready tab, review the list of readiness checks and their results.

    1. Select the View details links under Details.

    2. When the details box opens, you can view more details, individual system results, and the Remediation for failing health checks.

    Screenshot of Update progress page.

    To resolve the failures, follow the remediation instructions, and then select the Try again button to retry the pre-update readiness checks and Resume the update.

    To further troubleshoot, see the PowerShell section.

Using PowerShell

Using PowerShell, scenario 1: System health checks

To troubleshoot system health checks by using PowerShell:

  1. To validate that the system health checks failed, run the following command on one of the machines in your system:

    Get-SolutionUpdateEnvironment
    

    Here's a sample output:

    PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment 
    ResourceId        : redmond  
    SbeFamily         : VirtualForTesting  
    HardwareModel     : Virtual Machine  
    LastChecked       : 9/12/2023 10:34:42 PM  
    PackageVersions   : {Solution: 10.2309.0.20, Services: 10.2309.0.20, Platform: 1.0.0.0, SBE: 4.0.0.0}  
    CurrentVersion    : 10.2309.0.20  
    CurrentSbeVersion : 4.0.0.0  
    LastUpdated       :  
    State             : AppliedSuccessfully  
    HealthState       : Failure 
    HealthCheckResult : {Storage Pool Summary, Storage Services Physical Disks Summary, Storage Services Physical Disks  
    
                    Summary, Storage Services Physical Disks Summary...}  
    
    HealthCheckDate   : 9/12/2023 7:03:32 AM  
    
    AdditionalData    : {[SBEAdditionalData, Solution Builder extension is partially installed. Please install the latest  
    
                    Solution Builder Extension provided by your hardware vendor.  
    
                    For more information, see https://aka.ms/SBE.]}  
    
    HealthState       : Success  
    HealthCheckResult : {}  
    HealthCheckDate   : 8/4/2022 9:10:36 PM 
    
    PS C:\Users\lcmuser>
    
  2. Review the HealthState on your system and view the Failure or Warning value.

  3. To filter the HealthCheckResult property to identify failing tests, run the following command:

    $result = Get-SolutionUpdateEnvironment
    
    $result.HealthCheckResult | Where-Object {$_.Status -ne "SUCCESS"} | Format-List Title, Status, Severity, Description, Remediation
    

    Here's a sample output:

    Title       : The machine proxy on each failover cluster node should be set to a local proxy server 
    Status      : FAILURE 
    Severity    : INFORMATIONAL 
    Description : Validating cluster setup for update. 
    Remediation : `https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# 
              tests-for-cluster-updating-readiness`
    
    Title       : The CAU clustered role should be installed on the failover cluster to enable self-updating mode 
    Status      : FAILURE 
    Severity    : INFORMATIONAL 
    Description : Validating cluster setup for update. 
    Remediation : `https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# 
              tests-for-cluster-updating-readiness`
    
  4. Review the Remediation property for the failed tests and take action as appropriate to resolve the failures.

  5. If you need more diagnostic information to determine why tests failed, examine the AdditionalData property by using the -FullHealthCheckDetails parameter:

    $FullResults = Get-SolutionUpdateEnvironment -FullHealthCheckDetails
    
    $Failures = $FullResults.HealthCheckResult | Where-Object { $_.Status -ne "SUCCESS" -and $_.Severity -ne "INFORMATIONAL" }
    
    $Failures | Format-List *
    
  6. If available, use the diagnostic information shown in the AdditionalData property, such as FailedMachines, Source, and ExceptionMessage to determine which physical machines are causing the test failure. Then use the link in the Remediation property to resolve the failures.

  7. After you resolve the failure, invoke the system health checks again by running the following command:

    Invoke-SolutionUpdatePrecheck -SystemHealth
    
  8. Use Get-SolutionUpdateEnvironment to confirm the failing health check failures are resolved. It might take a few minutes for the system health checks to run.

    Here's a sample output:

    PS C:\Users\lcmuser>  Get-SolutionUpdateEnvironment | Format-List HealthState, HealthCheckResult, HealthCheckDate 
    
    HealthState       : InProgress 
    HealthCheckResult : 
    HealthCheckDate   : 1/1/0001 12:00:00 AM 
    
    PS C:\Users\lcmuser>  Get-SolutionUpdateEnvironment | Format-List HealthState, HealthCheckResult, HealthCheckDate
    
    HealthState       : Success 
    
    HealthCheckResult : {Storage Pool Summary, Storage Subsystem Summary, Storage Services Summary, Storage Services 
    
                    Summary...} 
    
    HealthCheckDate   : 10/18/2024 11:56:49 PM 
    

Using PowerShell, scenario 2: Update readiness checks

When update readiness checks fail, the system update also fails. To troubleshoot update readiness checks by using PowerShell:

  1. To verify that update readiness checks failed, run the following command on one machine in your system to find the ResourceId of the update:

    Get-SolutionUpdate | Format-Table ResourceId, State, Version
    

    Here's a sample output:

    PS C:\Users\lcmuser> Get-SolutionUpdate | Format-Table ResourceId, State, Version
    
    ResourceId                     State                  Version 
    ----------                     -----                  -------
    redmond/Solution10.2405.2.7    HealthCheckFailed      10.2405.2.7
    
    
  2. Using the update ResourceId from the previous command's output, run the following command and replace the Id parameter with that value:

    Get-SolutionUpdate -Id <Resource ID> | Format-Table Version, State, HealthCheckResult
    

    Here's a sample output:

    PS C:\Users\lcmuser> Get-SolutionUpdate -Id redmond/Solution10.2405.2.7 | Format-Table Version, State, HealthCheckResult 
    
    Version     State              HealthCheckResult 
    -------     -----              ----------------- 
    10.2405.2.7 HealthCheckFailed {Storage Subsystem Summary, Storage Pool Summary, Storage Services Physical Disks Summary, Stora...                       
    
    PS C:\Users\lcmuser>
    
  3. Review the State for the update and view the HealthCheckResult value.

  4. To filter the HealthCheckResult property and identify failed tests, run the following command. Replace the Id value with the update ResourceId from the previous command.

    $result = Get-SolutionUpdate -Id <Resource ID>
    $result.HealthCheckResult | Where-Object {$_.Status -ne "SUCCESS"} | Format-List Title, Status, Severity, Description, Remediation
    

    Here's a sample output:

    Title       : The machine proxy on each failover cluster node should be set to a local proxy server 
    Status      : FAILURE 
    Severity    : INFORMATIONAL 
    Description : Validating cluster setup for update. 
    Remediation : https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# 
              tests-for-cluster-updating-readiness 
    
    Title       : The CAU clustered role should be installed on the failover cluster to enable self-updating mode 
    Status      : FAILURE 
    Severity    : INFORMATIONAL 
    Description : Validating cluster setup for update. 
    Remediation : https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# 
              tests-for-cluster-updating-readiness
    
  5. Use the link in the Remediation property for the failed test. Open the article on a device with a web browser, and follow the recommended steps to resolve the failures.

  6. If you need more diagnostic information to determine why tests failed, examine the AdditionalData property by using the -FullHealthCheckDetails parameter:

    $FullResults = Get-SolutionUpdate -Id <Resource ID> -FullHealthCheckDetails
    
    $Failures = $FullResults.HealthCheckResult | Where-Object { $_.Status -ne "SUCCESS" -and $_.Severity -ne "INFORMATIONAL" }
    
    $Failures | Format-List *
    
  7. If available, use diagnostic details in the AdditionalData property, such as FailedMachines, Source, and ExceptionMessage, to identify which physical machines caused the test failure. Then use the link in the Remediation property to resolve the failures.

  8. After you resolve the failure, invoke the update readiness checks again by running the following command:

    Get-SolutionUpdate -Id <Resource ID> | Start-SolutionUpdate -PrepareOnly
    

Troubleshoot update failures

When an update fails, review the detailed step progress to identify the point of failure. This review helps you determine whether you can resolve the issue by repairing and resuming the update or whether you need to contact support.

  • Failing step name and description.

  • The machine or server where the step failed (if there's a machine-specific issue).

  • Failure message string (might pinpoint the issue to a specific known issue with documented remediation).

Use the Azure portal to identify failing step details, as described in Resume an update. Alternatively, see the next section for similar details in PowerShell by using Start-MonitoringActionplanInstanceToComplete.

The following table lists update failure scenarios and remediation guidelines.

Step names Type of issue Remediation
Any Power loss or other similar interruption to system during the update. 1. Restore power.
2. Run a system health check.
3. Resume the update.
CAU updates Cluster Aware Update (CAU) update run fails with a max retries exceeded failure. If multiple CAU attempts fail, investigate the first failure.

Use the start and end time of the first failure to match up with the correct Get-CauReport output to further investigate the failure.
Any Memory, power supply, boot driver, or similar critical failure on one or more nodes. See Repair a node on Azure Local for how to repair the failing node.
After the node is repaired, resume the update.

Collect update logs

You can also collect diagnostic logs to help Microsoft identify and fix the issues.

To collect logs for updates by using the Azure portal, see Resume an update.

To collect logs for the update failures, see Collect diagnostic logs for Azure Local.

View update summary report

To view a detailed update summary report by using PowerShell, follow these steps on the client that you're using to access your system:

  1. Establish a remote PowerShell session with the machine. Run PowerShell as an administrator and run the following command:

    Enter-PSSession -ComputerName <machine_IP_address> -Credential <username\password for the machine>
    
  2. Get all the solutions updates, and then filter the solution updates that correspond to a specific version. The version you use corresponds to the version of solution update that failed to install.

    $Update = Get-SolutionUpdate | ? Version -eq "<Version string>" -verbose
    
  3. Identify the action plan for the failed solution update run.

    $Failure = $update | Get-SolutionUpdateRun
    
  4. Identify the ResourceID for the update.

    $Failure
    

    Here's a sample output:

    PS C:\Users\lcmuser> $Update = Get-SolutionUpdate | ? Version -eq "10.2303.1.7" -verbose
    PS C:\Users\lcmuser> $Failure = $Update | Get-SolutionUpdateRun
    PS C:\Users\lcmuser> $Failure
    
    ResourceId      : redmond/Solution10.2303.1.7/a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1
    Progress        : Microsoft.AzureStack.Services.Update.ResourceProvider.UpdateService.Models.Step
    TimeStarted     : 4/21/2023 10:02:54 PM
    LastUpdatedTime : 4/21/2023 3:19:05 PM
    Duration        : 00:16:37.9688878
    State           : Failed
    

    Note the ResourceID GUID. This GUID corresponds to the ActionPlanInstanceID.

  5. View the summary for the ActionPlanInstanceID that you noted earlier.

    Start-MonitoringActionplanInstanceToComplete -actionPlanInstanceID <Action Plan Instance ID>
    

    Here's sample output:

    PS C:\Users\lcmuser> Start-MonitoringActionplanInstanceToComplete -actionPlanInstanceID a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1
    

    Screenshot of PowerShell collect logs output.

Resume an update

To resume a previously failed update run, retry the update run through the Azure portal or PowerShell.

The Azure portal

Use the Azure portal to browse to your failed update and select the Try again button. You can find this functionality at the Download updates, Validate that the system is ready, and Install updates stages of an update run.

Screenshot of the retry failed update button.

If you can't successfully rerun a failed update or need to troubleshoot an error further, follow these steps:

  1. Select the View details of an error.

  2. When the details box opens, review the error details. For more information on collecting diagnostic logs, select the How to collect logs link near the Open a support ticket button.

    Screenshot to download error logs.

    For more information on retrieving logs, see Collect diagnostic logs for Azure Local.

  3. Select the Open a support ticket button. Fill in the appropriate information, and attach your logs so that they're available to Microsoft Support.

    Screenshot to open a support ticket.

For more information on creating a support ticket, see Create a support request.

PowerShell

If you're using PowerShell and need to resume a previously failed update run, use the following command:

Get-SolutionUpdate | ? Version -eq "10.2302.0.31" | Start-SolutionUpdate

To resume a previously failed update due to update health checks in a Warning state, use the following command:

Get-SolutionUpdate | ? Version -eq "10.2302.0.31" | Start-SolutionUpdate -IgnoreWarnings

Next steps

Learn more about how to Run updates via PowerShell.

Learn more about how to Run updates via the Azure portal.