Hi MarkL,
I get what you’re after here , not documentation, but something you can actually use to understand how often this is happening and whether it’s affecting your users.
First, to be straight: there’s no supported way to directly see node failovers or interpret suffixes like P or U. Those node IDs and their states are internal to ADX and aren’t exposed in a way you can reliably monitor or map to roles.
What you can do (and what teams typically do in practice) is track the impact pattern, because node movement or rebalancing always shows up externally in a consistent way.
When a node is reassigned, restarted, or load is redistributed, you’ll usually see a short burst of failed or cancelled queries within a tight time window. That’s your signal.
To make this measurable, start by pulling failed queries using .show queries and group them into small time buckets (1–5 minutes). When you plot that over time, you’ll see spikes those spikes are effectively your “node transition events”.
If you have diagnostics enabled to Log Analytics, this becomes much more useful. You can trend query failures over time, identify how often these spikes occur, how long they last, and whether the frequency is increasing. That’s the most practical way to answer “how often are nodes impacting my workload”.
One important point from experience: queries don’t trigger node failovers. What they do is expose them, especially if they’re long-running and get interrupted mid-execution.
So the realistic approach is: you won’t be able to monitor the node itself, but you can monitor its effect very reliably through failure patterns.
If your users are seeing this frequently, the key thing to check is how often those failure spikes occur. Occasional short bursts are expected in a distributed system, but repeated or sustained spikes are not and having that timeline makes it much easier to take forward.
If you want, I can help you put together a simple query or Log Analytics view to visualize those spikes clearly.