Share via


Windows HPC Server Glossary

Applies To: Windows HPC Server 2008, Windows HPC Server 2008 R2

This topic contains the glossary terms that are specific to Windows HPC Server 2008 R2 and Windows HPC Server 2008.

Windows HPC Server Glossary

Term Definition

application network

A high-performance cluster network that is intended for Message Passing Interface (MPI) and other network-sensitive traffic. The application network is used for parallel application communication among the compute nodes in the cluster, separating network sensitive (for example, latency an bandwidth sensitive) MPI traffic from cluster management traffic on the private network.

availability policy

Windows HPC Server 2008 R2 only.

A policy that is part of the node template for workstation nodes or Windows Azure worker nodes. The policy defines how and when these nodes are available for running cluster jobs. Nodes can be brought online or offline manually, or according to a weekly schedule.

With Windows HPC Server 2008 R2 SP1, the workstation node availability policy includes optional user activity detection settings. This allows you to use the workstations for cluster jobs during working and non-working hours, but only if the workstation users are not actively using them.

backfilling

A scheduling policy that maximizes cluster utilization and throughput by allowing smaller jobs lower down in the queue to run ahead of a waiting job at the top of the queue, as long as the job at the top is not delayed as a result.

custom diagnostic test

Windows HPC Server 2008 R2 only.

A diagnostic test created by a cluster administrator or partner. Cluster administrators can add custom tests to the list of diagnostic tests for their HPC cluster, and then run them in the same way as the built-in diagnostic tests for Windows HPC Server 2008 R2.

compute node

An on-premises, dedicated server that is added to the cluster to run jobs.

durable session

A service-oriented architecture session in which the broker node stores messages using Microsoft Message Queuing (MSMQ). Responses that are stored by the broker can be retrieved by the client at any time, even after intentional or unintentional disconnect or hardware or network failure on the cluster.

enterprise network

An organizational network that is connected to the head node and optionally, to the cluster compute nodes. The enterprise network is often the business or organizational network most users log onto to perform their work.

failover cluster

A group of independent computers that work together to increase the availability of applications and services. In an HPC cluster, a failover cluster can be configured to support either the head node or one or more Windows Communication Foundation (WCF) broker nodes. The failover cluster contains two servers that work together, so if a failure is detected in the server that is acting as the head node or WCF broker node, the other server in the failover cluster automatically begins to provide service (in a process known as failover). Even if an outage occurs on a server, the HPC cluster can continue to function.

GPGPU job

A computationally intensive program that can run on graphics processor units (GPUs). GPUs can generally process parallel computations much faster than CPUs can, and GPGPU programs can be written to take advantage of this additional processing power. Depending on the type of GPU driver that is installed on the compute nodes, job owners might have to configure additional job properties to run their GPGPU jobs in a console session.

head node

A server that provides management and job scheduling services to the compute cluster. Management services include job scheduling, job and resource management, and Windows Deployment Services (WDS). The head node can also serve as a network address translation (NAT) gateway (part of Routing and Remote Access Service (RRAS)) between the cluster private network, if one exists, and the enterprise network.

heartbeat

A compute node's response to a health probe that is sent out periodically by the HPC Job Scheduler Service. This heartbeat signal verifies node availability. By configuring scheduler properties, the cluster administrator can set the frequency of the heartbeats (HeartbeatInterval), and the number of heartbeats that a node can miss (InactivityCount) before it is marked as unreachable.

HPC Broker Service

An HPC system service that runs on each WCF broker node and performs the following tasks:

  • Listens for broker requests from client applications

  • Creates a broker instance (worker process) for each session

  • Manages and monitors the session

The broker instance performs the following tasks:

  • Receives requests from the client

  • Distributes requests to the service hosts on the cluster

  • Collects responses and sends them to the client

  • Stores the messages to Message Queuing if the session is a durable session

HPC Cluster Manager

A management console that can be used to perform all aspects of cluster administration, including cluster configuration, node management, job queue management, diagnostics, and reporting.

HPC Job Manager

A console that is used to create, submit, view, and manage jobs.

HPC Job Scheduler Service

An HPC system service that is responsible for queuing jobs and tasks, allocating resources, and monitoring the state of the jobs, tasks, and nodes.

HPC Management Service

An HPC system service that controls all aspects of compute cluster operation and that manages the cluster database. This service on the head node provides overall cluster management of node discovery, as well as configuration management. Each node in the cluster also runs one instance of the HPC Management Service, which communicates with the HPC Management Service on the head node and is responsible for node discovery within the cluster.

HPC MPI Service

An HPC system service that allows Message Passing Interface (MPI) executable files that are built with the MS-MPI library to run on the cluster.

Windows HPC Server includes a Microsoft implementation of MPI developed for Windows, called Microsoft Message Passing Interface (MS-MPI), which is based on (and has identical programming application programing interfaces) to MPICH2 that is developed by the Argonne National Laboratory. MS-MPI is automatically installed on each node when Microsoft HPC Pack is installed, and is available as a no-cost MS-MPI SDK download for single use or redistribution with your MPI application.

HPC Node Manager Service

An HPC system service that runs on every compute node and communicates with the HPC Job Scheduler Service. The HPC Node Manager Service runs jobs on the node, sets task environment variables, and sends a heartbeat signal to the HPC Job Scheduler Service.

HPC service host

An executable (servicehost.exe) that runs on cluster nodes to host SOA services and performs the following tasks:

  • Loads the SOA service DLLs

  • Communicates with the broker instance on the WCF broker node

  • Raises the ServiceContext.OnExiting event if the request is canceled

HPC Services for Excel

Windows HPC Server 2008 R2 only.

A set of service-oriented architecture (SOA) clients and services that enable developers to quickly convert Microsoft Excel 2010 workbooks and create user-defined functions to (UDFs) to run on a Windows HPC Server 2008 R2 cluster. If a workbook contains independent units of calculation, multiple compute nodes can perform the calculations simultaneously. Parallel computation can significantly reduce workbook calculation time and make calculations across larger data sets more feasible. For more information, see HPC Services for Excel.

HPC Session Service

An HPC system service that runs on the head node and performs the following tasks:

  • Maintains a list of the available WCF broker nodes and Message Queuing resources

  • Balances workload across the broker nodes

  • Coordinates communication between the HPC Job Scheduler and the HPC Broker Service

  • Sends the client a list of available broker nodes when a client initiates a session

HPC System Definition Model (SDM) Store Service

An HPC system service that is responsible for maintaining the integrity of read and write data from the System Definition Model (SDM) data store, which is used to store cluster configuration information.

iSCSI boot node

Windows HPC Server 2008 R2 only.

A cluster node that can boot Windows HPC Server 2008 R2 from storage resources on a remote storage array by using the iSCSI protocol. iSCSI boot nodes do not require a local hard disk drive to serve as a system disk.

job

A request for cluster resources that contains, or will contain, one or more tasks. Jobs can contain serial tasks, parametric sweeps, dependency-based workflows, SOA sessions, or one or more parallel tasks using Microsoft Message Interface (MS-MPI) or other message passing mechanisms.

job activation filter

A custom program that the HPC Job Scheduler Service runs when a queued job is about to start. The job activation filter checks the job for factors that would cause the job to fail if activated, such as unavailability of licenses or exceeded usage time for the submitting user.

In Windows HPC Server 2008 R2, based on the return value from the activation filter, the HPC Job Scheduler Service will start the job, block the queue until the job can start, reserve resources for the job without blocking the queue, put the job on hold, or fail the job.

In Windows HPC Server 2008, based on the return value from the activation filter, the HPC Job Scheduler Service will start the job or block the queue until the job passes the filter or is canceled.

Job Manager

See HPC Job Manager.

job owner

The user who submitted the job to the HPC cluster. A job can only be modified, copied, saved, requeued, or canceled by the job owner, or by a cluster administrator. In the job list, you can see limited information about other jobs in the queue, but you must be the job owner or an administrator to view jobs details and tasks.

job priority

A property of a job that determines when the job will run, and how many resources the job will get. Job owners can specify the priority level for their job. Cluster administrators can specify valid priority ranges for different types of jobs (with a job template), and can modify priority level after a job is submitted.

In Windows HPC Server 2008 R2, priority is specified in terms of a priority band, a priority number, or a combination of the two. In Windows HPC Server 2008, priority is specified only in terms of a priority band. The priority bands and their corresponding numerical values are as follows:

Job priority levels are:

  • Lowest (0)

  • Below Normal (1000)

  • Normal (2000)

  • Above Normal (3000)

  • Highest (4000)

The numerical priority can have a value between 0 (Lowest) and 4000 (Highest). If you enter a value numerically, it will be displayed as the corresponding priority band, or as a combination. For example, if you specify a value of 2500, the priority is displayed as Normal+500.

job queue

A list of jobs that have been submitted to the HPC Job Scheduler Service. The order in which jobs in the queue are run is based on job priority class and, within priority class, time of submission (priority-based, first come first served scheduling). The order in which jobs are run is also affected by other configurable scheduling policies, such as preemption and backfilling.

job state

The state of a job with respect to the job queue. Job states include:

  • Configuring

  • Queued

  • Running

  • Finished

  • Canceled

  • Failed

job submission filter

A custom program that the HPC Job Scheduler Service runs every time a job is submitted. The filter can check the job properties to determine if the job should be added to the queue. A submission filter can also make changes to job property values. Task property values cannot be changed. Based on the return value from the submission filter, the HPC Job Scheduler Service will add the job to the queue or fail the job.

job template

A custom submission policy that is created by the cluster administrator to manage the types of jobs coming in to the cluster. Each job template consists of a list of job properties and associated value settings (default values and constraints), and a list of users with permission to submit jobs using that job template.

job XML file

An XML file that contains job or task specifications (also called a job description file). This file allows you to preserve a job or task as a pattern for future submissions. When you create a new job or task from a description file, you can modify any of the properties before submission.

MS-MPI

See Microsoft Message Passing Interface.

Microsoft Message Passing Interface

Software that allows Message Passing Interface (MPI) executable files that are built with the MS-MPI library to run on the cluster.

node group

A named collection of nodes. In Windows HPC Server 2008 R2, there are five default node groups: HeadNodes, ComputeNodes, WCFBrokerNodes, WorkstationNodes, and AzureWorkerNodes. In Windows HPC Server 2008, there are three default node groups: HeadNodes, ComputeNodes, and WCFBrokerNodes. A cluster administrator can create custom node groups. For example, nodes can be grouped according to physical or network location, installed applications, or users.

You can specify node groups when performing administrative tasks, or for submitting jobs to a particular set of nodes.

node role

The functionality that a node performs in a cluster. A node can act as a head node, a compute node, or a WCF broker node. To fulfill a role, the node must have the role features installed and have the role enabled. When a node includes the features for multiple roles, a cluster administrator can enable and disable node roles in HPC Cluster Manager.

Changing node roles allows you to easily scale your compute nodes and WCF broker nodes depending on your current needs.

node state

The state of a node in a cluster. Possible node states include:

  • Online

  • Offline

  • Unknown

  • Provisioning

  • Starting

  • Draining

  • Removing

  • Rejected

node template

A custom policy configuration created by the cluster administrator that defines the necessary tasks for adding, configuring, and maintaining nodes in a cluster. Templates can be created for broker nodes, compute nodes, Windows Azure worker nodes, or workstation nodes. When the template is assigned, the corresponding node role is enabled automatically. Workstation and Windows Azure worker node templates also define the availability policy for the node (that is, how and when nodes are brought online and made available for running cluster jobs).

node XML file

An XML file that can be used to add nodes from bare metal or preconfigured nodes to an HPC cluster. A node XML file can specify a hardware identification parameter for each node, such as the Media Access Control (MAC) address, or another attribute such as the computer name or the physical location of the node.

nonexclusive scheduling

A scheduling policy that allows tasks from different jobs to share a node.

parallel task

For an MPI application, a task that usually consists of a single executable that is running concurrently on multiple cores, with communication occurring between the processes.

parametric sweep

A parallel computing job that runs multiple instances of the same application, usually a serial application, with different input and output files. There is no communication or interdependency among the tasks. The tasks may or may not run in parallel, depending on the resources that are available on a cluster when the job is running.

preconfigured node

A computer on which a supported Windows operating system and Microsoft HPC Pack is already installed and that is connected to the HPC cluster networks. A preconfigured node can be added by a cluster administrator to an HPC cluster by assigning a node template that does not include a step to deploy an operating system image.

private network

A dedicated, intra-cluster network that carries intra-cluster management and deployment communication between nodes. This network also carries parallel application traffic (such as MPI communication) if no application network exists.

scheduling mode

Windows HPC Server 2008 R2 only.

The part of the policy configuration for the HPC Job Scheduler Service that optimizes resource allocation for large batch and MPI workloads or for service-oriented architecture (SOA) workloads. The HPC Job Scheduler Service can run in one of the following modes:

  • Queued mode (default) is optimized for large batch and MPI workloads and parametric sweeps. The HPC Job Scheduler Service starts jobs in queue order, and attempts to allocate the maximum requested resources to running jobs. This mode attempts to finish highest priority jobs as soon as possible, give jobs their maximum requested resources, and minimize job run time. Lower priority jobs generally wait longer in the queue.

  • Balanced mode is optimized for interactive workloads, such as SOA and Excel offloading jobs, and parametric sweeps. The HPC Job Scheduler Service attempts to start all incoming jobs as soon as possible at their minimum resource requirements. After all the jobs in the queue have their minimum resources, additional cluster resources are allocated to jobs based on their priority. Resource allocation is periodically rebalanced to fill idle resources, start new jobs, and adjust allocation.

scheduling policies

Rules that determine the order in which to run jobs from the queue, and determine how cluster resources are allocated to these jobs. As a cluster administrator, you can adjust how resources are allocated to jobs, and how jobs are handled, by configuring job scheduling policy options and by creating job templates that leverage the job scheduling policies. Job scheduling policies include:

  • Priority-based first come, first served

  • Batch or Queued scheduling mode

  • Preemption

  • Backfilling

  • Adaptive resource allocation (grow/shrink)

service configuration file

An XML file that is used to register the Service-Oriented Architecture (SOA) services on the cluster. In Windows HPC Server 2008 R2, the configuration file can also include settings that customize how the broker interacts with the service and how the broker interacts with the client.

task

A logically discrete section of computational work. A task cannot be run independently of a job, but a job can consist of only one task.

task flow job

A job with a set of tasks that run in a prescribed order, usually because one task depends on the result of another task. You can establish the order in which tasks are run by defining dependencies between the tasks.

UDF offloading

Enabling Microsoft Excel user-defined functions (UDFs) to run in a Windows HPC Server 2008 R2 cluster. Parallel computation can significantly reduce workbook calculation time and make calculations across larger data sets more feasible. To run on the cluster, the UDFs must be registered as cluster-safe, and they must be contained in an XLL file. The cluster administrator deploys the XLL and its dependencies to the cluster, and the Excel user can use advanced Excel properties to specify the cluster to use for offloading.

To support UDF offloading, HPC Services for Excel include a built-in client (Excel Cluster Connector, and add-in for Excel 2010) and two built-in XLL container services (for 32-bit and 64-bit XLLs) that run on the compute nodes. The XLLs run in the container services, and do not require Excel 2010 on the compute nodes.

Windows Azure worker node

Windows HPC Server 2008 R2 only.

A Windows Azure computing resource (which runs a guest operating system that is substantially compatible with Windows Server 2008 R2) that can be added to the Windows HPC cluster to run jobs. Windows Azure provides on-demand access to computational resources and storage in the cloud. HPC Pack 2008 R2 SP1 includes node templates for Windows Azure worker nodes that you can use to specify Windows Azure subscription information and an availability policy. You can use the node template to deploy a set of Azure nodes from HPC Cluster Manager, and then start and stop the nodes on-demand.

Windows Communication Foundation (WCF) broker node

A node that routes Windows Communication Foundation (WCF) calls to and from a Service-Oriented Architecture (SOA) client to the SOA services that are running on the cluster. This functionality can be provided by one or more dedicated nodes, or it can also be provided by adding a WCF broker role to the head node. WCF broker nodes must be connected to the enterprise network.

In Windows HPC Server 2008 R2, if the session is a durable session, the broker stores messages using Microsoft Message Queuing (MSMQ).

Workbook offloading

Enabling a Microsoft Excel workbook to run in parallel on a cluster by defining how to partition independent calculations in a workbook and then merge the results. Developers can use the HPC macro framework to define the partitions, and include VBA code that allows users to submit the workbook to a cluster. The cluster administrator deploys the workbook and any of its dependencies to the cluster, and ensures that Excel 2010 is installed on the compute nodes.

To support workbook offloading, HPC Services for Excel include a built-in SOA client (Excel Client) and a built-in service (Excel Service).

workstation node

Windows HPC Server 2008 R2 only.

A workstation computer (running the Windows 7 operating system) on the enterprise network that is added to the Windows HPC cluster to run cluster jobs. Workstation nodes are ideal for small, short-running jobs that can be stopped and restarted.

Workstation nodes are brought online to run cluster jobs based on an availability policy that defines how and when workstation nodes are available for running cluster jobs.

^Top of page