Share via

With ADX and Kusto what does the 'U' character in KSENGINE00000U NodeId refer to anything?

MarkL 5 Reputation points
2026-03-30T06:32:51.6633333+00:00

Hi there

I'm tracking down operation and query failures with Nodes becoming unavailable with scenarios such as a Node transferring to Primary where our queries fail. I'm wondering how often this occurs and whether our queries cause Node transfers/failovers.

Does the Node ID : -
KSENGINE00000P - refer to the primary Node
KSENGINE00000U - does the U character refer to unassigned or unkown?
KSENGINE000025 - do these refer to a child Node?

Thanks

Azure Data Explorer
Azure Data Explorer

An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.

0 comments No comments

3 answers

Sort by: Most helpful
  1. MarkL 5 Reputation points
    2026-03-30T23:34:31.7966667+00:00

    I can't accept either answer as I haven't learnt anything new that a simple Claude or Copilot question hasn't already answered. I'm after real answers i.e. undocumented knowledge, real under the hood investigating. I want to know how I can know when and how often these Nodes are failing as I have users complaining about these internal transition errors and I'm hoping for an answer that allows me to monitor this.

    0 comments No comments

  2. Pilladi Padma Sai Manisha 7,055 Reputation points Microsoft External Staff Moderator
    2026-03-30T22:43:27.8966667+00:00

    Hi MarkL,

    I get what you’re after here , not documentation, but something you can actually use to understand how often this is happening and whether it’s affecting your users.

    First, to be straight: there’s no supported way to directly see node failovers or interpret suffixes like P or U. Those node IDs and their states are internal to ADX and aren’t exposed in a way you can reliably monitor or map to roles.

    What you can do (and what teams typically do in practice) is track the impact pattern, because node movement or rebalancing always shows up externally in a consistent way.

    When a node is reassigned, restarted, or load is redistributed, you’ll usually see a short burst of failed or cancelled queries within a tight time window. That’s your signal.

    To make this measurable, start by pulling failed queries using .show queries and group them into small time buckets (1–5 minutes). When you plot that over time, you’ll see spikes those spikes are effectively your “node transition events”.

    If you have diagnostics enabled to Log Analytics, this becomes much more useful. You can trend query failures over time, identify how often these spikes occur, how long they last, and whether the frequency is increasing. That’s the most practical way to answer “how often are nodes impacting my workload”.

    One important point from experience: queries don’t trigger node failovers. What they do is expose them, especially if they’re long-running and get interrupted mid-execution.

    So the realistic approach is: you won’t be able to monitor the node itself, but you can monitor its effect very reliably through failure patterns.

    If your users are seeing this frequently, the key thing to check is how often those failure spikes occur. Occasional short bursts are expected in a distributed system, but repeated or sustained spikes are not and having that timeline makes it much easier to take forward.

    If you want, I can help you put together a simple query or Log Analytics view to visualize those spikes clearly.


  3. Martin Dimovski 1,711 Reputation points
    2026-03-30T17:21:43.7933333+00:00

    Hi,

    I don’t think Microsoft publicly documents the meaning of the suffix letters in node IDs such as KSENGINE00000P or KSENGINE00000U. Azure Data Explorer treats the compute nodes as an internal platform detail: Microsoft says you don’t see or manage the node VMs directly, and the service automatically manages instance creation, health monitoring, and replacement of unhealthy nodes. Because of that, I would be say that assigning a supported meaning such as “P = primary” or “U = unassigned/unknown” unless Microsoft support confirms it.

    So my practical reading would be: those values are internal engine/node identifiers, not something you should rely on as a documented topology model such as “primary / child / unassigned.” I also don’t see public documentation that maps those suffixes to node roles. Here you can find more : https://learn.microsoft.com/en-us/azure/reliability/reliability-data-explorer

    On the failover question, Azure Data Explorer is a distributed service and Microsoft documents that the platform handles health monitoring, replacement of unhealthy nodes, and response to availability zone failures automatically. Also there are notes that transient faults and occasional connectivity loss can happen during normal cloud operations and service maintenance, and recommends retrying failed queries and management operations. That suggests node movement/failover is primarily a platform event, not something your queries directly trigger in a supported/documented way.

    Hope this helps

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.