An Azure service that is used to automate, configure, and install updates across hybrid environments.
It seems the failure is not caused by your runbook code, authentication logic, or managed identity. The error occurs before PowerShell is initialized. Connect-AzAccount -Identity and Connect-MgGraph -Identity do not decrypt Automation assets.
Azure Automation job execution has two phases: service‑side and customer‑side.
service‑side: Job metadata and parameters are loaded, Automation account assets are decrypted and Worker slot is allocated. Your failure occurs here
If this phase fails:
- No PowerShell host is created
- No runbook code runs
- No logs or streams are generated
- Only a generic .NET exception is shown
Runbook execution (customer‑side):
- PowerShell starts
- Connect-AzAccount / Connect-MgGraph execute
- Your email logic runs
It seems you never reach this phase, Because managed identity authentication does not use stored secrets, it is not involved in Automation encryption. If auth were the issue, you would see a PowerShell error stream which you do not.
The email runbook fails more often only because it runs more frequently, increasing the chance of hitting an unhealthy backend instance.
Delayed/queued jobs further confirm Automation backend instability, not script problems.
What you can try now:
Remove legacy Connect-AzAccount -Identity: The command is not required for Graph email functionality. Removing it prevents token request collisions at job start.
__Test a minimal auth runbook:__Create a runbook that does only Managed Identity token acquisition
Connect-MgGraph -Identity
- Run this repeatedly on the same schedule/parallelism as your production email script.
- Purpose: simulate the failure condition without other variables.
- If failures appear here, it confirms token acquisition collisions are the root cause.
Reduce concurrency / staggering schedules: For high-frequency jobs, consider:
- Staggering start times across multiple workers.
- Ensuring jobs do not overlap.
- Introducing short random delays at job start if possible.
This mitigates token contention at the Automation platform level.
Monitor job queue delays: Use this as a signal of token pressure:
Automation Account > Jobs > filter status = Queued
- Long queues in your every-5-minute job indicate platform token pressure.
- This is expected if multiple -Identity calls occur concurrently.
Diagnostic Logging: You can keep JobLogs enabled.
- Don’t expect inner exception details; the failure occurs before script execution, so logs are limited.
- Still, logs confirm which jobs fail and under what schedule, helping with frequency adjustments.
This might be intermittent Managed Identity token acquisition failures during job initialization under high-frequency or concurrent runs, leading to payload decryption errors.
We have reached out to you in Private messages, could you please take a look into it? Thanks