Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: Hyperconverged deployments of Azure Local
This article describes how to troubleshoot solution updates that are applied to your Azure Local to keep it up-to-date.
About troubleshooting updates
If your system was created via a new deployment of Azure Local, then an orchestrator was installed during the deployment. The orchestrator manages all of the updates for the platform - OS, drivers and firmware, agents and services.
The new update solution includes a retry and remediation logic. This logic attempts to fix update issues in a nondisruptive way, such as retrying a Cluster-Aware Update (CAU) run. If an update run can't be remediated automatically, it fails. When an update fails, Microsoft recommends inspecting the details for the failure message to determine the appropriate next action. You can attempt to resume the update, if appropriate, to determine if a retry resolves the issue.
Troubleshoot readiness checks
Readiness checks are essential to ensure that you apply updates smoothly, keep your systems up-to-date, and maintain correct system functionality. Readiness checks are performed and reported separately in two scenarios:
System health checks that run once every 24 hours.
Update readiness checks that run after downloading the update content and before beginning the installation.
It is common for the results of system health checks and update readiness checks to differ. This happens because update readiness checks use the latest validation logic from the solution update to be installed, while system health checks always use validation logic from the installed version.
Both system and pre-update readiness checks perform similar validations and categorize three types of readiness checks: Critical, Warning, and Informational.
- Critical: Readiness checks that prevent you from applying the update. This status indicates issues that you must resolve before proceeding with the update.
- Warning: Readiness checks that also prevent you from applying the update, but you can bypass these. This status indicates potential issues that might not be severe enough to stop the update but should be addressed to ensure a smooth update process.
- Informational: Readiness checks that don't block the update. This status provides information about the system's state and any potential issues that shouldn't affect the update process directly. These checks are for your awareness and might not require immediate action.
The troubleshooting steps differ depending on which scenario the readiness checks are from.
Using Azure portal
Using Azure portal, scenario 1: System health checks
This scenario occurs when preparing to install system updates in Azure Update Manager:
In the system list, view the Critical state of Update readiness.
Select one or more systems from the list, then select Install now.
On the Check readiness tab, review the list of readiness checks and their results.
Select the links under Details.
When the details box opens, you can view more details, individual system results, and the Remediation for failing health checks.
To resolve the failures, follow the remediation instructions provided.
Note
The system health checks run every 24 hours, so it might take up to 24 hours for the new results to sync to the Azure portal after remediating the failures. To initiate a new system health check immediately or further troubleshoot, see the PowerShell section.
Using Azure portal, scenario 2: Update readiness checks
This scenario occurs when installing and tracking system updates in Azure Update Manager:
In History, select the failed update run from the list.
On the Validate that the system is ready tab, review the list of readiness checks and their results.
Select the View details links under Details.
When the details box opens, you can view more details, individual system results, and the Remediation for failing health checks.
To resolve the failures, follow the remediation instructions, and then select the Try again button to retry the pre-update readiness checks and Resume the update.
To further troubleshoot, see the PowerShell section.
Using PowerShell
Using PowerShell, scenario 1: System health checks
To troubleshoot system health checks via PowerShell:
To validate that the system health checks failed, run the following command on one of the machines in your system:
Get-SolutionUpdateEnvironmentHere's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment ResourceId : redmond SbeFamily : VirtualForTesting HardwareModel : Virtual Machine LastChecked : 9/12/2023 10:34:42 PM PackageVersions : {Solution: 10.2309.0.20, Services: 10.2309.0.20, Platform: 1.0.0.0, SBE: 4.0.0.0} CurrentVersion : 10.2309.0.20 CurrentSbeVersion : 4.0.0.0 LastUpdated : State : AppliedSuccessfully HealthState : Failure HealthCheckResult : {Storage Pool Summary, Storage Services Physical Disks Summary, Storage Services Physical Disks Summary, Storage Services Physical Disks Summary...} HealthCheckDate : 9/12/2023 7:03:32 AM AdditionalData : {[SBEAdditionalData, Solution Builder extension is partially installed. Please install the latest Solution Builder Extension provided by your hardware vendor. For more information, see https://aka.ms/SBE.]} HealthState : Success HealthCheckResult : {} HealthCheckDate : 8/4/2022 9:10:36 PM PS C:\Users\lcmuser>Review the
HealthStateon your system and view theFailureorWarningvalue.To filter the
HealthCheckResultproperty to identify failing tests, run the following command:$result = Get-SolutionUpdateEnvironment $result.HealthCheckResult | Where-Object {$_.Status -ne "SUCCESS"} | Format-List Title, Status, Severity, Description, RemediationHere's a sample output:
Title : The machine proxy on each failover cluster node should be set to a local proxy server Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : `https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness` Title : The CAU clustered role should be installed on the failover cluster to enable self-updating mode Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : `https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness`Review the
Remediationproperty for the failed tests and take action as appropriate to resolve the failures.If you require more diagnostic information to determine the cause of the failed tests, examine the
AdditionalDataproperty by using the-FullHealthCheckDetailsparameter:$FullResults = Get-SolutionUpdateEnvironment -FullHealthCheckDetails $Failures = $FullResults.HealthCheckResult | Where-Object { $_.Status -ne "SUCCESS" -and $_.Severity -ne "INFORMATIONAL" } $Failures | Format-List *If present, use the diagnostic information shown in
AdditionalDataproperty, such as 'FailedMachines', 'Source' and/or "ExceptionMessage" to determine which physical machines are causing the test failure, together with information provided in the link from theRemediationproperty to resolve the failures.After resolving the failures, invoke the system health checks again by running the following command:
Invoke-SolutionUpdatePrecheck -SystemHealthUse
Get-SolutionUpdateEnvironmentto confirm the failing health check failures are resolved. It might take a few minutes for the system health checks to run.Here's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment | Format-List HealthState, HealthCheckResult, HealthCheckDate HealthState : InProgress HealthCheckResult : HealthCheckDate : 1/1/0001 12:00:00 AM PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment | Format-List HealthState, HealthCheckResult, HealthCheckDate HealthState : Success HealthCheckResult : {Storage Pool Summary, Storage Subsystem Summary, Storage Services Summary, Storage Services Summary...} HealthCheckDate : 10/18/2024 11:56:49 PM
Using PowerShell, scenario 2: Update readiness checks
When update readiness checks fail, this causes the update to fail on the system. To troubleshoot update readiness checks via PowerShell:
To validate that the update readiness checks failed, run the following command on one of the machines in your system to identify the
ResourceIdof the update:Get-SolutionUpdate | Format-Table ResourceId, State, VersionHere's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdate | Format-Table ResourceId, State, Version ResourceId State Version ---------- ----- ------- redmond/Solution10.2405.2.7 HealthCheckFailed 10.2405.2.7Using the
ResourceIdof the update you're attempting to install, run the following command replacing theIdparameter with the ResourceId of your update, from the output of the previous command:Get-SolutionUpdate -Id <Resource ID> | Format-Table Version, State, HealthCheckResultHere's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdate -Id redmond/Solution10.2405.2.7 | Format-Table Version, State, HealthCheckResult Version State HealthCheckResult ------- ----- ----------------- 10.2405.2.7 HealthCheckFailed {Storage Subsystem Summary, Storage Pool Summary, Storage Services Physical Disks Summary, Stora... PS C:\Users\lcmuser>Review the
Statefor the update and view theHealthCheckResultvalue.To filter the
HealthCheckResultproperty to identify failing tests, run the following command, replace theIdvalue with theResourceIdof your update, identified from the earlier command:$result = Get-SolutionUpdate -Id <Resource ID> $result.HealthCheckResult | Where-Object {$_.Status -ne "SUCCESS"} | Format-List Title, Status, Severity, Description, RemediationHere's a sample output:
Title : The machine proxy on each failover cluster node should be set to a local proxy server Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness Title : The CAU clustered role should be installed on the failover cluster to enable self-updating mode Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readinessUse the link shown in the
Remediationproperty of the failed test. Review the article using a device with a web-browser to determine appropriate actions to resolve the failures.If you require more diagnostic information to determine the cause of the failed tests, examine the
AdditionalDataproperty by using the-FullHealthCheckDetailsparameter:$FullResults = Get-SolutionUpdate -Id <Resource ID> -FullHealthCheckDetails $Failures = $FullResults.HealthCheckResult | Where-Object { $_.Status -ne "SUCCESS" -and $_.Severity -ne "INFORMATIONAL" } $Failures | Format-List *If present, use the diagnostic information shown in
AdditionalDataproperty, such as 'FailedMachines', 'Source' and/or "ExceptionMessage" to determine which physical machines are causing the test failure, together with information provided in the link from theRemediationproperty to resolve the failures.After resolving the failures, invoke the update readiness checks again by running the following command
Get-SolutionUpdate -Id <Resource ID> | Start-SolutionUpdate -PrepareOnly
Troubleshoot update failures
When an update fails, review the detailed step progress to identify where it failed. This helps determine whether the issue can be fixed with a simple repair and resume, or requires a support engagement to resolve the issue. Key items to note for the failing step include:
Failing step name and description.
Which machine or server the step failed on (if there's a machine-specific issue).
Failure message string (might pinpoint the issue to a specific known issue with documented remediation).
Microsoft recommends using the Azure portal to identify the failing step information as shown at Resume an update. Alternatively, see the next section for how view similar details in PowerShell using Start-MonitoringActionplanInstanceToComplete.
The following table lists update failure scenarios and remediation guidelines.
| Step names | Type of issue | Remediation |
|---|---|---|
| Any | Power loss or other similar interruption to system during the update. | 1. Restore power. 2. Run a system health check. 3. Resume the update. |
| CAU updates | Cluster Aware Update (CAU) update run fails with a max retries exceeded failure. |
If multiple CAU attempts fail, investigate the first failure. Use the start and end time of the first failure to match up with the correct Get-CauReport output to further investigate the failure. |
| Any | Memory, power supply, boot driver, or similar critical failure on one or more nodes. | See Repair a node on Azure Local for how to repair the failing node. After the node is repaired, the update can be resumed. |
Collect update logs
You can also collect diagnostic logs to help Microsoft identify and fix the issues.
To collect logs for updates using the Azure portal, see Resume an update.
To collect logs for the update failures see Collect diagnostic logs for Azure Local.
View update summary report
To view a detailed update summary report using PowerShell, follow these steps on the client that you're using to access your system:
Establish a remote PowerShell session with the machine. Run PowerShell as administrator and run the following command:
Enter-PSSession -ComputerName <machine_IP_address> -Credential <username\password for the machine>Get all the solutions updates and then filter the solution updates corresponding to a specific version. The version used corresponds to the version of solution update that failed to install.
$Update = Get-SolutionUpdate | ? Version -eq "<Version string>" -verboseIdentify the action plan for the failed solution update run.
$Failure = $update | Get-SolutionUpdateRunIdentify the
ResourceIDfor the update.$FailureHere's a sample output:
PS C:\Users\lcmuser> $Update = Get-SolutionUpdate | ? Version -eq "10.2303.1.7" -verbose PS C:\Users\lcmuser> $Failure = $Update | Get-SolutionUpdateRun PS C:\Users\lcmuser> $Failure ResourceId : redmond/Solution10.2303.1.7/a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1 Progress : Microsoft.AzureStack.Services.Update.ResourceProvider.UpdateService.Models.Step TimeStarted : 4/21/2023 10:02:54 PM LastUpdatedTime : 4/21/2023 3:19:05 PM Duration : 00:16:37.9688878 State : FailedNote the
ResourceIDGUID. This GUID corresponds to theActionPlanInstanceID.View the summary for the
ActionPlanInstanceIDthat you noted earlier.Start-MonitoringActionplanInstanceToComplete -actionPlanInstanceID <Action Plan Instance ID>Here's sample output:
PS C:\Users\lcmuser> Start-MonitoringActionplanInstanceToComplete -actionPlanInstanceID a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1
Resume an update
To resume a previously failed update run, you can retry the update run via the Azure portal or PowerShell.
The Azure portal
We highly recommend using the Azure portal, to browse to your failed update and select the Try again button. This functionality is available at the Download updates, Validate that the system is ready, and Install updates stages of an update run.
If you're unable to successfully rerun a failed update or need to troubleshoot an error further, follow these steps:
Select the View details of an error.
When the details box opens, you can review the error details. For more information on collecting diagnostic logs, you can select the How to collect logs link near the Open a support ticket button.
For more information on retrieving logs, see Collect diagnostic logs for Azure Local.
Additionally, you can select the Open a support ticket button, fill in the appropriate information, and attach your logs so that they're available to Microsoft Support.
For more information on creating a support ticket, see Create a support request.
PowerShell
If you're using PowerShell and need to resume a previously failed update run, use the following command:
Get-SolutionUpdate | ? Version -eq "10.2302.0.31" | Start-SolutionUpdate
To resume a previously failed update due to update health checks in a Warning state, use the following command:
Get-SolutionUpdate | ? Version -eq "10.2302.0.31" | Start-SolutionUpdate -IgnoreWarnings
Next steps
Learn more about how to Run updates via PowerShell.
Learn more about how to Run updates via the Azure portal.