Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: Hyperconverged deployments of Azure Local
This article describes how to troubleshoot solution updates that you apply to your Azure Local instance to keep it up-to-date.
About troubleshooting updates
If you deploy your system by using a new Azure Local installation, you install the orchestrator as part of that deployment. The orchestrator manages platform updates, including the OS, drivers and firmware, and agents and services.
The new update solution includes retry and remediation logic. This logic attempts to fix update problems in a nondisruptive way, such as retrying a Cluster-Aware Update (CAU) run. If an update run can't be remediated automatically, it fails. When an update fails, inspect the details for the failure message to determine the appropriate next action. You can attempt to resume the update, if appropriate, to determine if a retry resolves the problem.
Troubleshoot readiness checks
Readiness checks are essential to ensure that you apply updates smoothly, keep your systems up-to-date, and maintain correct system functionality. Readiness checks are performed and reported separately in two scenarios:
System health checks that run once every 24 hours.
Update readiness checks that run after downloading the update content and before beginning the installation.
It is common for the results of system health checks and update readiness checks to differ. This happens because update readiness checks use the latest validation logic from the solution update to be installed, while system health checks always use validation logic from the installed version.
Both system and pre-update readiness checks perform similar validations and categorize three types of readiness checks: Critical, Warning, and Informational.
- Critical: Readiness checks that prevent you from applying the update. This status indicates issues that you must resolve before proceeding with the update.
- Warning: Readiness checks that also prevent you from applying the update, but you can bypass these. This status indicates potential issues that might not be severe enough to stop the update but should be addressed to ensure a smooth update process.
- Informational: Readiness checks that don't block the update. This status provides information about the system's state and any potential issues that shouldn't affect the update process directly. These checks are for your awareness and might not require immediate action.
The troubleshooting steps vary depending on the readiness check scenario.
Using Azure portal
Using Azure portal, scenario 1: System health checks
This scenario occurs when preparing to install system updates in Azure Update Manager:
In the system list, select the Update readiness status. Systems that require attention show a Critical or Warning state.
Review the list of readiness checks and their results.
Select the links under Details to view more information.
When the details box opens, you can view more details, individual system results, and the Remediation for failing health checks.
Follow the remediation steps provided to resolve any failures.
After resolving the issues, select Check again to rerun the readiness checks.
Using the Azure portal, scenario 2: Update readiness checks
This scenario occurs when you install and track system updates in Azure Update Manager:
In History, select the failed update run from the list.
On the Validate that the system is ready tab, review the list of readiness checks and their results.
Select the View details links under Details.
When the details box opens, you can view more details, individual system results, and the Remediation for failing health checks.
To resolve the failures, follow the remediation instructions, and then select the Try again button to retry the pre-update readiness checks and Resume the update.
To further troubleshoot, see the PowerShell section.
Using PowerShell
Using PowerShell, scenario 1: System health checks
To troubleshoot system health checks by using PowerShell:
To validate that the system health checks failed, run the following command on one of the machines in your system:
Get-SolutionUpdateEnvironmentHere's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment ResourceId : redmond SbeFamily : VirtualForTesting HardwareModel : Virtual Machine LastChecked : 9/12/2023 10:34:42 PM PackageVersions : {Solution: 10.2309.0.20, Services: 10.2309.0.20, Platform: 1.0.0.0, SBE: 4.0.0.0} CurrentVersion : 10.2309.0.20 CurrentSbeVersion : 4.0.0.0 LastUpdated : State : AppliedSuccessfully HealthState : Failure HealthCheckResult : {Storage Pool Summary, Storage Services Physical Disks Summary, Storage Services Physical Disks Summary, Storage Services Physical Disks Summary...} HealthCheckDate : 9/12/2023 7:03:32 AM AdditionalData : {[SBEAdditionalData, Solution Builder extension is partially installed. Please install the latest Solution Builder Extension provided by your hardware vendor. For more information, see https://aka.ms/SBE.]} HealthState : Success HealthCheckResult : {} HealthCheckDate : 8/4/2022 9:10:36 PM PS C:\Users\lcmuser>Review the
HealthStateon your system and view theFailureorWarningvalue.To filter the
HealthCheckResultproperty to identify failing tests, run the following command:$result = Get-SolutionUpdateEnvironment $result.HealthCheckResult | Where-Object {$_.Status -ne "SUCCESS"} | Format-List Title, Status, Severity, Description, RemediationHere's a sample output:
Title : The machine proxy on each failover cluster node should be set to a local proxy server Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : `https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness` Title : The CAU clustered role should be installed on the failover cluster to enable self-updating mode Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : `https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness`Review the
Remediationproperty for the failed tests and take action as appropriate to resolve the failures.If you need more diagnostic information to determine why tests failed, examine the
AdditionalDataproperty by using the-FullHealthCheckDetailsparameter:$FullResults = Get-SolutionUpdateEnvironment -FullHealthCheckDetails $Failures = $FullResults.HealthCheckResult | Where-Object { $_.Status -ne "SUCCESS" -and $_.Severity -ne "INFORMATIONAL" } $Failures | Format-List *If available, use the diagnostic information shown in the
AdditionalDataproperty, such asFailedMachines,Source, andExceptionMessageto determine which physical machines are causing the test failure. Then use the link in theRemediationproperty to resolve the failures.After you resolve the failure, invoke the system health checks again by running the following command:
Invoke-SolutionUpdatePrecheck -SystemHealthUse
Get-SolutionUpdateEnvironmentto confirm the failing health check failures are resolved. It might take a few minutes for the system health checks to run.Here's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment | Format-List HealthState, HealthCheckResult, HealthCheckDate HealthState : InProgress HealthCheckResult : HealthCheckDate : 1/1/0001 12:00:00 AM PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment | Format-List HealthState, HealthCheckResult, HealthCheckDate HealthState : Success HealthCheckResult : {Storage Pool Summary, Storage Subsystem Summary, Storage Services Summary, Storage Services Summary...} HealthCheckDate : 10/18/2024 11:56:49 PM
Using PowerShell, scenario 2: Update readiness checks
When update readiness checks fail, the system update also fails. To troubleshoot update readiness checks by using PowerShell:
To verify that update readiness checks failed, run the following command on one machine in your system to find the
ResourceIdof the update:Get-SolutionUpdate | Format-Table ResourceId, State, VersionHere's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdate | Format-Table ResourceId, State, Version ResourceId State Version ---------- ----- ------- redmond/Solution10.2405.2.7 HealthCheckFailed 10.2405.2.7Using the update
ResourceIdfrom the previous command's output, run the following command and replace theIdparameter with that value:Get-SolutionUpdate -Id <Resource ID> | Format-Table Version, State, HealthCheckResultHere's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdate -Id redmond/Solution10.2405.2.7 | Format-Table Version, State, HealthCheckResult Version State HealthCheckResult ------- ----- ----------------- 10.2405.2.7 HealthCheckFailed {Storage Subsystem Summary, Storage Pool Summary, Storage Services Physical Disks Summary, Stora... PS C:\Users\lcmuser>Review the
Statefor the update and view theHealthCheckResultvalue.To filter the
HealthCheckResultproperty and identify failed tests, run the following command. Replace theIdvalue with the updateResourceIdfrom the previous command.$result = Get-SolutionUpdate -Id <Resource ID> $result.HealthCheckResult | Where-Object {$_.Status -ne "SUCCESS"} | Format-List Title, Status, Severity, Description, RemediationHere's a sample output:
Title : The machine proxy on each failover cluster node should be set to a local proxy server Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness Title : The CAU clustered role should be installed on the failover cluster to enable self-updating mode Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readinessUse the link in the
Remediationproperty for the failed test. Open the article on a device with a web browser, and follow the recommended steps to resolve the failures.If you need more diagnostic information to determine why tests failed, examine the
AdditionalDataproperty by using the-FullHealthCheckDetailsparameter:$FullResults = Get-SolutionUpdate -Id <Resource ID> -FullHealthCheckDetails $Failures = $FullResults.HealthCheckResult | Where-Object { $_.Status -ne "SUCCESS" -and $_.Severity -ne "INFORMATIONAL" } $Failures | Format-List *If available, use diagnostic details in the
AdditionalDataproperty, such asFailedMachines,Source, andExceptionMessage, to identify which physical machines caused the test failure. Then use the link in theRemediationproperty to resolve the failures.After you resolve the failure, invoke the update readiness checks again by running the following command:
Get-SolutionUpdate -Id <Resource ID> | Start-SolutionUpdate -PrepareOnly
Troubleshoot update failures
When an update fails, review the detailed step progress to identify the point of failure. This review helps you determine whether you can resolve the issue by repairing and resuming the update or whether you need to contact support.
Failing step name and description.
The machine or server where the step failed (if there's a machine-specific issue).
Failure message string (might pinpoint the issue to a specific known issue with documented remediation).
Use the Azure portal to identify failing step details, as described in Resume an update. Alternatively, see the next section for similar details in PowerShell by using Start-MonitoringActionplanInstanceToComplete.
The following table lists update failure scenarios and remediation guidelines.
| Step names | Type of issue | Remediation |
|---|---|---|
| Any | Power loss or other similar interruption to system during the update. | 1. Restore power. 2. Run a system health check. 3. Resume the update. |
| CAU updates | Cluster Aware Update (CAU) update run fails with a max retries exceeded failure. |
If multiple CAU attempts fail, investigate the first failure. Use the start and end time of the first failure to match up with the correct Get-CauReport output to further investigate the failure. |
| Any | Memory, power supply, boot driver, or similar critical failure on one or more nodes. | See Repair a node on Azure Local for how to repair the failing node. After the node is repaired, resume the update. |
Collect update logs
You can also collect diagnostic logs to help Microsoft identify and fix the issues.
To collect logs for updates by using the Azure portal, see Resume an update.
To collect logs for the update failures, see Collect diagnostic logs for Azure Local.
View update summary report
To view a detailed update summary report by using PowerShell, follow these steps on the client that you're using to access your system:
Establish a remote PowerShell session with the machine. Run PowerShell as an administrator and run the following command:
Enter-PSSession -ComputerName <machine_IP_address> -Credential <username\password for the machine>Get all the solutions updates, and then filter the solution updates that correspond to a specific version. The version you use corresponds to the version of solution update that failed to install.
$Update = Get-SolutionUpdate | ? Version -eq "<Version string>" -verboseIdentify the action plan for the failed solution update run.
$Failure = $update | Get-SolutionUpdateRunIdentify the
ResourceIDfor the update.$FailureHere's a sample output:
PS C:\Users\lcmuser> $Update = Get-SolutionUpdate | ? Version -eq "10.2303.1.7" -verbose PS C:\Users\lcmuser> $Failure = $Update | Get-SolutionUpdateRun PS C:\Users\lcmuser> $Failure ResourceId : redmond/Solution10.2303.1.7/a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1 Progress : Microsoft.AzureStack.Services.Update.ResourceProvider.UpdateService.Models.Step TimeStarted : 4/21/2023 10:02:54 PM LastUpdatedTime : 4/21/2023 3:19:05 PM Duration : 00:16:37.9688878 State : FailedNote the
ResourceIDGUID. This GUID corresponds to theActionPlanInstanceID.View the summary for the
ActionPlanInstanceIDthat you noted earlier.Start-MonitoringActionplanInstanceToComplete -actionPlanInstanceID <Action Plan Instance ID>Here's sample output:
PS C:\Users\lcmuser> Start-MonitoringActionplanInstanceToComplete -actionPlanInstanceID a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1
Resume an update
To resume a previously failed update run, retry the update run through the Azure portal or PowerShell.
The Azure portal
Use the Azure portal to browse to your failed update and select the Try again button. You can find this functionality at the Download updates, Validate that the system is ready, and Install updates stages of an update run.
If you can't successfully rerun a failed update or need to troubleshoot an error further, follow these steps:
Select the View details of an error.
When the details box opens, review the error details. For more information on collecting diagnostic logs, select the How to collect logs link near the Open a support ticket button.
For more information on retrieving logs, see Collect diagnostic logs for Azure Local.
Select the Open a support ticket button. Fill in the appropriate information, and attach your logs so that they're available to Microsoft Support.
For more information on creating a support ticket, see Create a support request.
PowerShell
If you're using PowerShell and need to resume a previously failed update run, use the following command:
Get-SolutionUpdate | ? Version -eq "10.2302.0.31" | Start-SolutionUpdate
To resume a previously failed update due to update health checks in a Warning state, use the following command:
Get-SolutionUpdate | ? Version -eq "10.2302.0.31" | Start-SolutionUpdate -IgnoreWarnings
Next steps
Learn more about how to Run updates via PowerShell.
Learn more about how to Run updates via the Azure portal.