Threads View (Parallel Performance)
This is the most detailed and feature-rich view in the Concurrency Visualizer. By using this view you can identify whether the threads are executing or blocking because of synchronization, I/O, or some other reason.
During profile analysis, the Concurrency Visualizer examines all operating system context switch events for each application thread. Context switches can occur for many reasons, such as the following:
A thread is blocked on a synchronization primitive.
The quantum of a thread expires.
A thread makes a blocking I/O request.
Threads View assigns a category to every context switch when a thread has stopped executing. The categories are shown in the legend in the lower-left of the view and are explained by their corresponding Help topics. Categorization of context switch events is achieved by searching the call stack of the thread for well-known blocking APIs. In the event that we do not find a call stack match, we use the wait reason that is provided by Windows. While technically correct, the Windows category may be based on an implementation detail instead of on the expectation or intent of the user. An example of this is that Windows will report blocking on a native slim reader-writer lock as I/O instead of synchronization. However, in these cases you should be able to identify the root cause of any blocking event by examining the call stacks that correspond to context switch events.
Threads View also shows inter-thread dependencies. For example, if you identify a thread that is blocked on a synchronization object, the tool can often show you the thread that unblocked it as well as what that thread was doing at the time by showing its call stack when it unblocked the thread of interest.
Finally, when threads are executing, the tool collects samples so that you can analyze which code is executed by one or more threads during an execution segment. In addition to providing sample-based visibility into thread execution, this view also provides call stack tree execution profiling reports and blocking reports.
Usage
The Threads View is intended to serve many purposes. Some typical uses include the following:
Identify reasons why the user interface (UI) of an application is unresponsive during certain execution phases.
Identify the amount of time that is spent blocking on synchronization, I/O, page faults, etc.
Identify the degree of interference with other processes that are executing on the system.
Identify load balancing issues for parallel execution.
Identify the reasons for scalability that is suboptimal or nonexistent (for example, why the performance of a parallel application does not improve when more logical cores are available in the system).
Understand the degree of concurrency in the application in order to help in parallelization.
Understand dependencies among worker threads and critical paths of execution.
The rest of this section explains a recommended usage pattern to make your experience most productive in this view. We first recommend that you use the CPU Utilization View to focus on a specific phase of process execution that is of interest. By taking advantage of Scenario Marker Support in your application, you can significantly help in this process. Once you have zoomed in on an execution time window of interest, you can select the Threads View.
Identifying and Narrowing an Area of Interest
In the Threads View, you will see a timeline view that has time on the X-axis. On the Y-axis, you will see two I/O channels, one for reads and the other for writes, for each physical disk device in the system that had activity during the profile collection. Below the disk channels, you will see a channel for each thread in the process. Initially, the threads are sorted in the order in which they are created, which results in the main application thread being first. You may use the sort option in the upper-left corner of the view to sort threads by another criterion (for example, according to which threads are performing the most execution work).
Next, you can hide threads that are not performing any work in the scenario of interest by selecting their names from the column at the left and then clicking the Hide Selected Threads icon in the toolbar. Such threads can exist for many reasons. For example, they may be idle thread pool threads. Such threads are usually completely blocked (typically due to synchronization). You should remove them from the view because their statistics can contaminate the reports with irrelevant information.
You can use the Execution Breakdown tab report to identify additional threads that can be hidden. To see the Execution Breakdown graph, click Per Thread Summary in the active legend. This graph shows the breakdown of thread states for threads in the application for the currently visible time window. To support scalability in that graph, we limit the number of threads that are displayed so that in certain cases the graph does not show data for all threads in the application. When this occurs, it displays ellipses in the rightmost position.
Now that you have narrowed your analysis to a region of interest and selected the threads of interest, you can start your performance analysis. The following sections describe the various tools at your disposal.
Thread-Blocking Details
To understand the underlying reasons for thread-blocking regions, you can either pause on or select (by left-clicking) one of these regions. When you pause on a blocking region, we display a tooltip with general information about the blocking event such as category, a blocking API if available, region start time, and blocking duration. For the preemption category, we also show the process ID and thread ID in the process that was scheduled on the CPU when your thread was stopped by the kernel. You can also select a blocking region in a channel of interest, which results in the display of the current stack in the bottom window. In addition to what was shown in the tooltip, the Current stack tab will show the call stack that resulted in blocking your thread. By examining the call stack, you can determine the underlying reason for the thread-blocking event. By default, we show the complete call stacks in this view, which includes user and kernel stacks. When the tool can identify a specific API as the culprit, the call stack is trimmed beyond that frame. If the tool is unable to determine the root function call that resulted in blocking, the whole call stack is exposed so that the user can examine it and make that determination.
It is often true that a path of execution results in multiple blocking events. It is also often valuable to understand the cumulative blocking delays that are organized by call stacks. For this purpose, we provide a call-tree based profile report for each blocking category. You can view the profile by selecting one of the blocking category legend entries on the left. These reports give you a quick way to prioritize how to invest your time performance when you tune your application.
Inter-thread Dependencies
The Concurrency Visualizer shows you dependencies between blocking threads in your process. To determine the thread whose actions unblocked a thread of interest, click the relevant blocking segment. If the tool can determine the unblocking thread, it draws a line to connect the executing segment that follows your blocking segment to the other thread. That line shows how a different thread unblocked the selected thread. In addition, the Unblocking stack tab is populated with the relevant call stack. Thus, you can quickly identify a blocked thread, learn what it was trying to do, and see what finally enabled it to execute.
Thread Execution Details
It is often useful to determine what code is being executed by threads when your application is executing. These regions are displayed as green segments in the timeline graph. Two features help with this.
First, when you click an execution segment in the timeline, we attempt to find the nearest sample profile call stack. Upon success, we display a black caret above the location in the execution block where the sample was taken and display the call stack itself in the Current stack tab. You can select other samples by clicking elsewhere in the execution segments. Occasionally we may be unable to find a sample. This is usually because of the one-ms period at which we collect sample profiles. For example, when an execution segment is less than one millisecond long, no call stack may be collected. Sampling frequency cannot be changed, but one millisecond is a good balance between accuracy and execution overhead.
Second, the execution sampling profile report together with its call tree view is an important feature that can help you understand where execution time is spent. This feature may be accessed by clicking the Execution item in the active legend. The execution profile provides a sample report for all enabled (unhidden) threads in the current view that are filtered by the time range in the window.
Timeline Graph
The timeline graph shows the activity of all threads in the process and all physical disk devices on the host computer. You can zoom in on the timeline by dragging the mouse pointer, or by using the zoom slider in the toolbar of the window, or by holding down CTRL while spinning the mouse wheel. Pause on one of the horizontal bars, or segments, to see the category, start time, and duration for that point on the thread. Click one of the segments to see the call stack on the lower part of the screen on the Current stack tab.
In the timeline graph, color indicates the state of a thread at any given time. For example, green segments are executing, red segments are blocked for synchronization, yellow segments have been preempted, and purple segments are engaged in device I/O. This view is useful for examining the balance of work for a group of threads that is involved in a parallel loop or concurrent tasks. If one or more of the threads are taking much longer to complete than the others, this may indicate an unbalanced work load and an opportunity to improve the performance of your program by distributing work more evenly among the threads.
You can also use the timeline graph to examine inter-thread dependencies and the temporal relationships between blocking and blocked threads. You can see how many threads are running at any time by looking at the vertical slice at that point on the timeline. If only one thread is green (executing) at that time, the application is not taking full advantage of concurrency that is available on the system. From the toolbar, you can click the up and down buttons to sort and move individual threads, or use the Hide Threads button to hide uninteresting threads.
Profile Reports
Below the timeline graph is an active legend and tabbed window that has several reports. Profile reports automatically update as the Threads View is changed by zooming, scrolling, hiding, or unhiding threads. For larger traces, the reports window will become dimmed while the updated reports are calculated. Each report has two filter adjustments: Noise reduction and Just My Code. Noise reduction helps filter out uninteresting call tree entries where little time is spent. The default value is 2 percent, but it can be adjusted to any value from 0 percent to 99 percent. The check box for Just My Code lets you filter out or view call tree entries other than your own. The available reports are detailed in the following section.
Profile Report
The current Profile report can be accessed through this tab. Clicking an entry in the active legend determines which profile report is shown. The available profile reports are listed in the following section, starting with Execution.
Current Stack
This tab shows the call stacks for a selected thread segment in the details graph. The call stacks are trimmed to focus on activity that is directly related to your program. In the selection window, the Currently Executing thread information is immediately visible.
Unblocking Stack
Click Unblocking stack to see which thread unblocked this one and at what line of code.
Execution
The execution profile report shows a detailed table together with the percentage of time each thread spent in various states such as execution, I/O, and memory management.
Click the tree control next to any call tree entry of interest to drill down and find the line of code in which execution time is spent. Once a call tree entry of interest has been identified, right-click that entry for a context menu that says View Source and View Call Sites. Clicking View Source locates the line of source, and clicking View Call Sites locates the line of code that called this one. If only one call site exists, the click locates the highlighted line of code for the call site. If multiple call sites exist, a dialog box is presented, from which one entry may be selected. Clicking the Go to source button locates the highlighted call site. It is often most useful to select and locate the source for the call site that has the most instances, the greatest time, or both. For more information, see Execution Profile Report.
Synchronization
The synchronization report shows the calls that are responsible for synchronization blocks, together with the aggregate blocking times of each call stack. You can use this information to identify and investigate areas of concern.
Click the tree control next to any call tree entry of interest to drill down and find the line of code in which synchronization time is spent. Once a call tree entry of interest has been identified, right-click that entry for a context menu that says View Source and View Call Sites. Clicking View Source locates the line of source, and clicking View Call Sites locates the line of code that called this one. If only one call site exists, the click connects to the highlighted line of code for the call site. If multiple call sites exist, a dialog box is presented, from which one entry is selected. Clicking the Go to source button will locate the highlighted call site. It is often most useful to select and connect to the source for the call site that has the most instances, the greatest time, or both. For more information, see Synchronization Time.
I/O
The I/O report shows the calls that are responsible for I/O blocks, together with the aggregate blocking times of each call stack. You can use this information to identify and investigate areas of concern.
Click the tree control next to any call stack of interest to drill down and find the line of code in which the I/O time is spent. Once a call tree entry of interest has been identified, right-click that entry for a context menu that says View Source and View Call Site. Clicking View Source locates the line of source, and clicking View Call Site locates the line of code that called this one. If only one call site exists, the click connects to the highlighted line of code for the call site. If multiple call sites exist, a dialog box is presented, from which one entry is selected. Clicking the Go to source button locates the highlighted call site. It is often most useful to select and connect to the source for the call site that has the most instances, the greatest time, or both. For more information, see I/O Time (Threads View).
Sleep
The sleep report shows the calls that are responsible for sleep blocks, together with the aggregate blocking times of each call stack. You can use this information to identify and investigate areas of concern.
Click the tree control next to any call stack of interest to drill down and find the line of code in which the sleep time is spent. Once a call tree entry of interest has been identified, right-click that entry for a context menu that says View Source and View Call Site. Clicking View Source locates the line of source, and clicking View Call Site locates the line of code that called this one. If only one call site is available, the click connects to the highlighted line of code for the call site. If multiple call sites are available, a dialog box is presented, from which one entry is selected. Clicking the Go to source button will locate the highlighted call site. It is often most useful to select and connect to the source for the call site that has the most instances, the greatest time, or both. For more information, see Sleep Time.
Paging
The Paging report shows the calls where preemption blocks occurred, together with the aggregate blocking times of each call stack. You can use this information to identify and investigate areas of concern. This blocking report is less actionable than the others because preemption is typically imposed upon your process by the operating system instead of resulting from your code. It does show what kinds of preemptions occurred, where they occurred, and how long your process remained in a given preemption state.
Click the tree control next to any call tree entry of interest to drill down and find the line of code in which the preemption time was spent. Once a call tree entry of interest has been identified, right-click that entry for a context menu that says View Source and View Call Sites. Clicking View Source navigates to the line of source, and clicking View Call Sites directs navigation to the line of code that called this one. If only one call site is available, the click navigates directly to the highlighted line of code for the call site. If multiple call sites are available, a dialog box is presented, from which one entry is selected. Clicking the Go to source button will locate the highlighted call site. It is often most useful to select and connect to the source for the call site that has the most instances, the greatest time, or both. For more information, see Memory Management Time.
Preemption
The Paging report shows the calls where preemption blocks occurred, together with the aggregate blocking times of each call stack. You can use this information to identify and investigate areas of concern. This blocking report is less actionable than the others because preemption is typically imposed upon your process by the operating system instead of resulting from your code. It does show what kinds of preemptions occurred, where they occurred, and how long your process remained in a given preemption state.
Click the tree control next to any call stack of interest to drill down and find the line of code in which the preemption time was spent. Once a call tree entry of interest has been identified, right-click that entry for a context menu that says View Source and View Call Site. Clicking View Source navigates to the line of source, and clicking View Call Site directs navigation to the line of code that called this one. If only one call site is available, the click navigates directly to the highlighted line of code for the call site. If multiple call sites are available, a dialog box is presented, from which one entry is selected. Clicking the Go to source button will locate the highlighted call site. It is often most useful to select and connect to the source for the call site that has the most instances, greatest time, or both. For more information, see Preemption Time.
UI Processing
The UI processing report shows the calls that are responsible for UI processing blocks, together with the aggregate blocking times of each call stack. You can use this information to identify and investigate areas of concern.
Click the tree control next to any call tree entry of interest to drill down and find the line of code in which the UI processing time is spent. Once a call tree entry of interest has been identified, right-click that entry for a context menu that says View Source and View Call Sites. Clicking View Source locates the line of source, and clicking View Call Sites locates the line of code that called this one. If only one call site is available, the click connects to the highlighted line of code for the call site. If multiple call sites are available, a dialog box is presented, from which one entry is selected. Clicking the Go to source button will locate the highlighted call site. It is often most useful to select and connect to the source for the call site that has the most instances, the greatest time, or both. For more information, see UI Processing Time.
Per Thread Summary
This tab shows a color-coded column view of the total time that each thread spent in each state, such as running, blocked, and I/O. The columns are labeled at the bottom. In the default zoom level, the main thread is the leftmost column. When you adjust the zoom level in the details graph, the tab reports will automatically update themselves to reflect the new time scale. To support scalability in this graph, we limit the number of threads that are displayed. Therefore, in certain cases the graph may not show data for all threads in the application, but will indicate this limitation by displaying ellipses in the rightmost position. If the thread you want to see in this graph is not present, you can hide uninteresting threads until the desired thread appears in the graph. For more information, see Per Thread Summary Report.
File Operations
This tab shows which threads were involved in disk I/O and which files they touched. This includes DLLs that were loaded, how many bytes were read, and other information. This report can be useful to evaluate time that is spent accessing files during execution, especially when your process seems to be I/O bound. For more information, see File Operations Report (Threads View).