Use the patching wizard to add software updates to a node template
Use workstations to run cluster jobs
Create customizable dashboards that allow you to monitor nodes at a glance
Save a command or script as a diagnostic test in HPC Cluster Manager
SOA scheduling and runtime
Optimize job scheduling for SOA jobs and interactive workloads
Manage SOA service configuration settings from a single location
Enable and collect trace logs to troubleshoot SOA sessions
Job scheduling and runtime
Provide accurate job prioritization for your cluster
Check for license availability before a job is started
Stop a running job or task immediately
Exclude particular nodes from running tasks in your job
Receive notification when your job is done
Provision or clean up the nodes that are allocated to your job
Provide custom job progress information
Allow canceled tasks time to save state information or clean up before exiting
Cluster Management
The scenarios in this section help you try new management features in Windows HPC Server 2008 R2.
Use the patching wizard to add software updates to a node template
Scenario
You have deployed cluster nodes, and now you want to use the node templates to manage and apply software updates (patches) to the nodes.
Goal
Use the Add Software Updates Wizard to add an Apply Updates task to a node template.
Requirements
A cluster with a Windows HPC Server 2008 R2 head node.
The head node must be able to access the Microsoft Update website or the WSUS server in your enterprise.
Administrative permissions on the cluster.
Steps
The Mayntenance phase of a node template can include an Apply Updates task, with settings that you configure for which updates to apply. When you run the Mayntain action on nodes, the Apply Updates task downloads updates to the compute nodes from the Microsoft Update website or the WSUS server in your enterprise, and then installs the updates.
Note
The Apply Updates task in the node template cannot install updates that include software license terms (also known as an End User License Agreement or EULA). This type of update requires the administrator of each node to accept the software license terms before the update can be installed. In this scenario, you have to install updates that include software license terms manually.
The following procedure describes how to add the Apply Updates task to a node template. The node template must have already been used to deploy one or more nodes.
In HPC Cluster Manager, click Configuration, and then click Node Templates.
Right-click a node template, and then click Add Software Updates.
Follow the steps in the Add Software Updates Wizards to select the update level and the specific updates to add. In the final step, you can select to install the updates now or later.
To make changes to the Apply Updates task, you can run the wizard again, or right-click the node template and click Edit.
Expected results
An Apply Updates task is added to the Mayntenance phase of the selected node template.
Related Resources
For more information about applying updates by using and enterprise WSUS server or by using a node template, see the Best Practices topic in the updating nodes step-by-step guide (https://go.microsoft.com/fwlink/?LinkId=194794).
^ Top of page
Use workstations to run cluster jobs
Scenario
You have powerful workstation computers that are not utilized overnight and on weekends. You want to harvest this processing power to run cluster jobs.
Create customizable dashboards that allow you to monitor nodes at a glance
Scenario
When administrating clusters of up to 1000 nodes, you need the ability to create customizable dashboards that allow you to monitor several node metrics for the entire cluster at a glance. To more easily identify outliers and bottlenecks and quickly switch between views, you can create multiple node list or heat map tabs that focus on sets of information such as:
Network view
CPU or disk load
Application trends for large MPI jobs
Goal
Create one or more new tabs in Node Management.
Requirements
A head node with Windows HPC Server 2008 R2 installed.
Administrative permissions on the cluster.
Steps
In HPC Cluster Manager, click Node Management.
In the Navigation Pane, click Nodes.
In the view pane, click the blank tab, and then click Customize Tab.
Type a name for the tab, and then click List View or Heat Map View.
Add one or more metrics.
Click Apply at any time to see the current tab configuration.
Click OK to save your changes to the tab.
In the view pane, use the slide bar to adjust the heat map zoom, or click the Fit to window icon to automatically adjust the tile size for the best fit.
You can organize the heat map view by location by clicking the Group by location icon. Remove the location grouping by clicking the Group by name icon. (You can specify the node location property in the node XML or by selecting a node and clicking Edit.)
To change the settings on a tab, right-click the tab name, then click Customize Tab.
If you are creating a Heat Map tab, you can customize the following display options:
Color scale: The minimum value for a metric is associated with a color, for example, white, and the maximum value for that metric is associated with another color, for example, blue. In this case, lower values for that metric appear as lighter shades of blue, and higher values appear as darker shades of blue. For each metric, you can customize the maximum and minimum values and associated colors. You can also flip the scale so that the minimum values are darker, and the maximum values are lighter.
Linear or logarithmic color banding: The color bands that are used to display metric values can be displayed in linear or logarithmic scales. In a linear scale, the color bands are equally sized across the defined value range. In a logarithmic scale, the color bands are logarithmically sized across the value range. Logarithmic scale is useful when you want to visually distinguish values at one end of the value range.
Stacking or overlaying metrics: You can view multiple metrics in Stacking or Overlaying view. Stacking displays a color bar for each metric. Overlaying displays only the most significant metric for each node. Significance is based on the order in which the metrics are defined in the Customize Tab dialog box. The first metric is displayed by default. If a metric value reaches the darkest color band, that is the metric that is displayed. If more than one metric reaches the darkest color band, the first one listed is the one that is displayed.
Metric aggregation period: Aggregate metrics over a short time period by increasing the number of seconds for the metric value display.
Expected results
Your new tab configurations are saved, and you can easily switch between different views of your nodes.
The node filters that you apply persist across all tabs.
The nodes that you select in one tab are selected when you go to a different tab.
Save a command or script as a diagnostic test in HPC Cluster Manager
Scenario
When managing your cluster, there are some commands or scripts that you run regularly to check the status of your nodes. You would like to be able to run your own tests and the built-in tests from a single location.
Goal
Save the fsutil volume diskfree command as a diagnostic test that checks for free disk space on your nodes and then run the test and view results from HPC Cluster Manager.
Requirements
A head node with Windows HPC Server 2008 R2 installed.
Administrative permissions on the cluster.
Steps
Step 1: Define the test
On the head node, open a text editor such as Notepad, and paste in the following XML code. Optionally, change the Company name to your name.
The scenarios in this section help you try new SOA scheduling and runtime features in Windows HPC Server 2008 R2.
Optimize job scheduling for SOA jobs and interactive workloads
Scenario
Your cluster runs mostly interactive workloads, such as service-oriented architecture (SOA) jobs. One or two large jobs may be taking up most of the cluster, but there are many other interactive jobs that need to run. You want as many jobs to start as possible, rather than having most of the resources allocated to the top of the job queue.
To optimize job scheduling for interactive workloads, you can change the scheduling mode from Queued to Balanced.
In Balanced mode, the scheduler attempts to start all incoming jobs as soon as possible at their minimum resource requirements. After all the jobs in the queue have their minimum resources, additional cluster resources are allocated to jobs based on their load and priority. Resource allocation is periodically rebalanced to fill idle resources and accommodate new jobs.
Goal
Change the scheduling mode from Queued to Balanced.
Requirements
A Windows HPC Server 2008 R2 cluster with at least one compute node and one broker node.
Administrative permissions on the cluster.
Steps
Use one of the following methods to change the Scheduling Mode from Queued to Balanced:
In HPC Cluster Manager, click Options, and then click Job Scheduler Configuration. Scheduling Mode and the associated settings can be configured on the Policy Configuration tab.
Run HPC PowerShell as an administrator, and then type:
Set-HpcClusterProperty –schedulingMode Balanced
Open a Command Prompt window as an administrator, and then type:
cluscfg setparams SchedulingMode=Balanced
Submit several jobs to the cluster.
After you have set the Balanced mode, you can adjust how additional resources are allocated with the PriorityBias setting, and how often the scheduler rebalances with the ReBalancingInterval setting.
PriorityBias controls how additional resources are allocated to jobs. In terms of Balanced mode, “additional resources” refers to cluster resource above the total minimum resources for all running jobs. Tasks that are running on additional resources can be canceled with immediate preemption to accommodate new jobs or to converge on the desired allocation pattern. You can choose from the following three options:
HighBias: All additional resources are allocated to higher priority jobs.
MediumBias (Default): Each priority band is given a higher proportion of additional resources than the band below it. The priority bands are Highest, Above Normal, Normal, Below Normal, and Lowest.
NoBias: Resources are allocated equally regardless of priority.
ReBalancingInterval represents the time, in seconds, between scheduler rebalancing passes.
You can use one of the following methods to change Priority Bias and ReBalancingInterval:
Run HPC PowerShell as an administrator, and then type:
Jobs are started as soon as possible at their minimum resources requirements. If all jobs in the queue have started, all reMayning resources in the cluster are added to jobs based on their priority and workload. As new jobs start, cluster resources are reallocated in proportion to each job’s priority.
Related Resources
Queued mode is priority-based, first come first served scheduling like in Windows HPC Server 2008. For information, see Understanding Job Scheduling Policies (https://go.microsoft.com/fwlink/?LinkId=177866).
^ Top of page
Manage SOA service configuration settings from a single location
Scenario
You have multiple SOA services installed to a central location on your cluster, and you want the ability to see all of the deployed services, change settings to help diagnose and troubleshoot specific services, and modify the service configuration files from a centralized location.
In HPC Cluster Manager, in Configuration, the Services view lets you:
See a list of all of the SOA services that are centrally deployed on your cluster (services that are deployed locally on compute nodes are not included).
Run diagnostics to verify that the DLLs for the service can be loaded on the specified nodes, and that any detected dependencies for the DLL are present on the nodes.
Open the service configuration file in an editor.
Set event level tracing.
Configure error log output.
Goal
Add a service on the cluster and manage the service configuration settings from HPC Cluster Manager.
Requirements
A Windows HPC Server 2008 R2 cluster with at least one compute node and one WCF Broker node.
Administrative permissions on the cluster.
A SOA service assembly (DLL) and a service configuration file (file must be named servicename.config, where the servicename is the same as that passed into the SessionStartInfo constructor).
Write permissions on the configuration file to edit the file.
A client application that starts an HPC session for that service.
Optionally, a client computer with HPC Pack 2008 R2 installed.
Steps
Copy your service .dll file to a folder named C:\ServicesR2 on each compute node.
On the head node, copy the service configuration file to the C:\Program Files\Microsoft HPC Pack 2008 R2\ServiceRegistration folder.
Click Start, point to All Programs, click Microsoft HPC Pack 2008 R2, and then click HPC Cluster Manager.
In HPC Cluster Manager, click Configuration, and then click Services.
The view pane displays a list all services that have configuration files in the ServiceRegistration folder. Verify that the service that you just added appears in the list.
Right-click your service, then click Edit Configuration File. The configuration file for your service opens in the default XML editor.
Ensure that the assembly attribute of the service element points to the location of your service .dll (C:\ServicesR2\<yourServiceName>.dll). For example:
Save the changes, if you made any, and then close the text editor.
To verify that the service can be loaded, right-click the service, and then click Run SOA Service Loading Diagnostic Test.
The Run Diagnostic Tests dialog box appears, and the service that you selected is automatically specified in the parameter for the test. Click Run.
To view test results: In Diagnostics, in the Navigation Pane, click Test Results.
Expected results
Service configuration files that you put in the C:\Program Files\Microsoft HPC Pack 2008 R2\ServiceRegistration folder appear in the Configuration section in HPC Cluster Manager.
The service loading diagnostic test checks for detected DLL dependencies.
Related Resources
Enable and collect trace logs to troubleshoot SOA sessions
Enable and collect trace logs to troubleshoot SOA sessions
Scenario
You have a development cluster and you are testing SOA clients and services. Your service DLL includes code to generate trace information.
Goal
Enable tracing on the head node and collect the trace logs from each node that was used during the session.
Requirements
A Windows HPC Server 2008 R2 cluster with at least one compute node and one WCF Broker node.
Administrative permissions on the cluster.
A SOA service assembly (DLL) and a service configuration file deployed to the cluster. The service configuration file must be centrally deployed. For more information, see Manage SOA service configuration settings from a single location.
A client application that starts an HPC session for that service.
Optionally, a client computer with HPC Pack 2008 R2 installed.
Steps
When you enable tracing in the service configuration file, the trace information is logged to a file on the compute nodes. The log files trace steps from the service call and the intermediate results on cluster. You can collect and remove traces by using the Job Management view or the HPC PowerShell cmdlets. You can view the trace log files with the WCF Service Trace Viewer (SvcTraceViewer.exe).
Click Start, point to All Programs, click Microsoft HPC Pack 2008 R2, and then click HPC Cluster Manager.
In HPC Cluster Manager, click Configuration, and then click Services.
Right-click the service that you want to troubleshoot, and then click Set Event Logging Level. In the dialog box, select the desired trace level and then click OK.
Start a session with that service.
Click Job Management, and then click All Jobs.
In the job list, find the job that is associated with the session that you are debugging. The job ID is the same as the session ID.
Right-click the job, and then click Collect Trace.
In the Collect Trace dialog box, specify the shared folder where you would like to collect the trace logs. The folder must be accessible from the compute nodes.
Verify that the trace logs appear in the specified folder.
Right-click the job, and then click Delete Trace to delete the trace logs from the compute nodes.
Important
Event logging is not generally recommended for production environments. After collecting the trace logs, ensure that you delete them from the compute nodes to avoid consuming disk space.
The scenarios in this section help you try new job scheduling and runtime features in Windows HPC Server 2008 R2.
Provide accurate job prioritization for your cluster
Scenario
Your cluster serves many departments and user groups, and you need accurate job prioritization to meet business needs. Each department has a prioritized list of jobs, and you want the jobs from each department to run in the requested order. Occasionally, you need to make adjustments to the order of the job queue based on particular circumstances or needs.
Priority and submit time help determine when the job will run, and how many resources the job will get. When multiple jobs are submitted with the same priority level, the jobs scheduler attempts to start the jobs in each priority level on a first-come, first-served basis. To ensure that business need has a stronger impact on the order of the job queue than submit time, you ask cluster users to specify a granular priority level for each job.
Goal
Users submit jobs with numerical priority levels.
When necessary, manually adjust priority levels on submitted jobs.
Requirements
A head node with Windows HPC Server 2008 R2 installed
Administrative permissions on the cluster
Steps
In HPC Pack 2008 R2, the job priority can have a value between 0-4000. Users can specify priority in terms of a priority band, a priority number, or a combination of the two. The priority bands and their corresponding numerical values are as follows:
Lowest (0)
BelowNormal (1000)
Normal (2000)
AboveNormal (3000)
Highest (4000)
The numerical priority can have a value between 0 (Lowest) and 4000 (Highest). If you enter a value numerically, it is displayed as the corresponding priority band, or as a combination. For example, if you specify a value of 2500, the priority is displayed as Normal+500.
Monitor and adjust the job queue
Cluster administrators and job owners can modify the Priority job property for any active job (Queued or Running).
Run HPC PowerShell.
Type the following cmdlets to see a list of active jobs in job queue order (descending priority and ascending submit time):
Modify priority levels on submitted jobs as appropriate. You can use the Set-HpcJob cmdlet to modify the priority of a job. For example, the following cmdlet sets a priority level of 2550 for job 122:
set-hpcjob –id 122 -priority 2550
Expected results
Cluster administrators and job owners can modify the Priority job property for any active job (Queued or Running).
The job scheduler attempts to start jobs in priority order. Scheduling policies (such as backfilling) and activation filters can affect the order in which jobs start.
Related Resources
Check for license availability before a job is started
^ Top of page
Check for license availability before a job is started
Scenario
Your cluster runs several applications that use licenses that are shared on a licensing server. You want to:
Schedule jobs efficiently and reduce the number of jobs that are failing due to unavailable licenses
Mayntain First-Come First-Serve scheduling when jobs are waiting for licenses
Make efficient use of the cluster when jobs are waiting for licenses
The HPC Job Scheduler Service can run a custom activation filter on queued jobs that are about to start. A job activation filter is a custom application that you can write to provide additional checks and controls, such as checking for license availability. Depending on the return value from your filter, the HPC Job Scheduler Service takes the appropriate action on the job.
The HPC 2008 R2 SDK samples include an example of an activation filter that checks for license availability against a FlexLM license file.
Goal
Build and try the Activation Filter sample that is included in the HPC 2008 R2 SDK samples (HPC2008R2.SampleCode.zip). The sample is a Visual Studio 2008 project named FlexLM.sln that is in the HPC2008R2.SampleCode \Scheduler\Activation Filter folder.
Requirements
A head node with Windows HPC Server 2008 R2 installed
FlexLM license file copied to the head node. The path and filename should not contain spaces.
Steps
FlexLM.sln includes a sample activation filter that checks for license availability and the FlexLM.exe.config file that you can use to specify the location of the FlexLM utilites and license file. In the FlexLM projects properties, there are custom pre-build event commands and post-build event commands. The pre-build commands are used to create the files needed to create event log entries. The post-build commands unregister any old version of the FlexLM activation filter and then register the new version so that it can create events and the event viewer can display them. The commands assume that Visual Studio is creating the files in c:\Program Files\Microsoft HPC Pack 2008 R2\Bin\. The cluscfg command tells the HPC Job Scheduler to use the new Activation Filter.
The following steps describe how to configure and build the solution:
On the head node, run Visual Studio 2008.
Open FlexLM.sln.
Open the Jobs.cs file and update the reference to the job xml schema (from HPCS2008 to HPCS2008R2) in the ParseJobXml() method as follows:
If you do not perform this step, the sample filter will not be able to parse the job XML file.
In the FlexLM.exe.config file, specify the paths to the FlexLM utility (in PollCommandName) and the FlexLM license file (in PollCommandArguments). The following XML code example shows how these paths are specified:
Build the solution. The program deploys the DLL and .config files on the head node, creates event log files, and adds the activation filter to the Job Scheduler parameters.
To test the filter, submit jobs to the cluster that require licenses.
Expected results
The following list describes the supported exit codes for an activation filter, and the corresponding Job scheduler action:
0: The job is started.
1: The job is not started and reMayns in the queue. The filter reevaluates the job periodically until either the job passes, or until the job is canceled. No other jobs of equal or lower priority are started until the job passes or is canceled.
2: The job is not started, but available resources are reserved for it depending on the Scheduling Mode: In Queued, up to the job’s maximum resources are reserved; in Balanced, the minimum resources are reserved. Other jobs can be started on other resources. The filter reevaluates the job periodically until the job passes.
3: The job is put on hold until the date and time specified by the Hold Until job property. After the hold period, the job is reevaluated by the filter program. If the filter returns with exit code 3 and no Hold Until value is specified for that job, the job is held for the amount of time specified by the Default Hold Duration cluster setting.
4: The job is marked as Failed with an error message that the job was failed by the activation filter.
Any other exit code: Undefined.
Filter timeout: Same as exit code 2.
Filter not found: Same as exit code 2.
Related Resources
None.
^ Top of page
Stop a running job or task immediately
Scenario
You want to stop a running job or task immediately.
In HPC Server 2008 R2, a cluster administrator defines a Task Cancelation Grace Period that can allow tasks that are canceled time to save state information and clean up before exiting. To use the grace period, the application must process the CTRL_BREAK event. If the application does not process the event, the task exits immediately. For a service to use the grace period, it must process the ServiceContext.OnExiting event. Job owners can define Node Release tasks for their jobs to perform data or log file collection or return nodes to their pre-job state. The Node Release task runs when a job is about to release a node, including when the job is canceled.
You can force cancel a job or task to skip grace periods and node release tasks.
Goal
Force cancel a job or task.
Requirements
A head node with Windows HPC Server 2008 R2 installed.
Administrative or user permissions on the cluster.
Steps
Submit a job with at least two tasks including:
A task that runs an application that responds to the CTRL_BREAK event.
A Node Release task.
To force cancel a task, use one of the following methods. Include the –force parameter and specify the ID of your job and task, and optionally, the sub-task.
In HPC PowerShell, use the following cmdlet: Stop-HpcTask –JobId <yourJobID> -TaskID <yourTaskID> [-subTaskID <yourSubTaskID>] -force
At a command prompt, use the following command: task cancel <yourJobID>.<yourTaskID>[.<yourSubTask>] /force
To force cancel a job, use one of the following methods. Include the –force parameter, and specify the ID of your job.
In HPC PowerShell use the following cmdlet: Stop-HpcJob <yourJobID> -force
At a command prompt use the following command: job cancel <yourJobID> /force
Expected results
Force cancelling a task: the task stops immediately and does not use the Task Cancel Grace period (the application must process the CTRL_BREAK event to make use of the grace period).
Force cancelling a job: the job stops immediately. The tasks in the job do not use the Task Cancel Grace period, and the Node Release task does not run.
Related Resources
Allow canceled tasks time to save state information or clean up before exiting
Exclude particular nodes from running tasks in your job
Scenario
You notice that one particular node keeps failing tasks in your job. You want the job scheduler to stop scheduling your tasks on that node.
In Windows HPC Server 2008 R2, you can specify a list of nodes to exclude from your job.
Note
For SOA jobs, the broker node automatically updates and Mayntains the list of excluded nodes according to the EndPointNotFoundRetryPeriod setting (in the service configuration file). This setting specifies how long the service host should retry loading the service and how long the broker should wait for a connection. If this time elapses, the broker adds the node (service host) to the Excluded Nodes list. The service configuration also includes the maxExcludedNodes setting that specifies how many nodes can be excluded before the session fails.
Goal
Add one or more nodes to the Excluded Nodes job property.
See all excluded nodes on the cluster (Administrator).
Requirements
A head node with Windows HPC Server 2008 R2 installed.
Administrative or user permissions on the cluster.
Steps
Defining excluded nodes for a job
For any active job, you can add or remove nodes in the Excluded Nodes jobs property, or clear the list. The following lists the commands to modify and view the Excluded Nodes list using HPC PowerShell or a command prompt.
Or to view all job properties, job view <yourJobID> /detailed
Monitoring excluded nodes on the cluster
To see all excluded nodes on a cluster, use the Get-HpcJob PowerShell cmdlet. The following example shows how to list all of the excluded nodes for jobs that were submitted today. The script also lists the job template that was used for the job that excluded the node. In the following cmdlet, <today’s date> is specified in a date format such as mm/dd/yyyy:
If the cluster administrator detects and resolves the issue on one or more nodes, the administrator can remove the fixed node from any node exclusion list in which it appears. The following cmdlet gets all active jobs and removes the fixed nodes from the node exclusion lists (this has no effect on jobs that do not list the specified nodes):
You submitted a long-running job to the cluster, and would like to be notified when the job is done.
Goal
Enable eMayl notification on the cluster and submit a job that requests notification on job completion.
Requirements
A head node with Windows HPC Server 2008 R2 installed.
Administrative permissions on the cluster.
Steps
Enable eMayl notification on the cluster:
In HPC Cluster Manager, click Options, and then click Job Scheduler Configuration.
On the E-Mayl notifications tab, specify the SMTP server, authentication, and originating address. Click the “More about eMayl notifications” link on the tab for important considerations.
Submit a job that requests notification on completion:
In HPC Cluster Manager, in Job Management, click New Job.
In Job run options, in Send a notification, select the Completes check box.
Add one or more tasks to the job, and then click Submit.
Expected results
If notification is selected for a specific job, and eMayl notification is enabled on the cluster, job owners receive the requested eMayl messages to the e-eMayl account that is associated with their doMayn credentials.
Related Resources
None.
^ Top of page
Provision or clean up the nodes that are allocated to your job
Scenario
You want to perform some basic provisioning of the nodes that are allocated to your job. For example, you may want to copy files or verify the running environment before your primary tasks run. To prepare the nodes that are allocated to your job, you can add a Node Preparation task to your job.
After your tasks complete, you need to collect data or log files from the nodes that were allocated to your job or return the nodes to their pre-job state. To clean up nodes after running your primary tasks, you can add a Node Release task to your job.
Goal
Submit a job with Node Preparation and Node Release tasks.
Requirements
A Windows HPC Server 2008 R2 cluster with at least one compute node.
If you are using HPC PowerShell or a Command Prompt window, use the –Type property to designate a Node Preparation task, for example:
Add-HpcTask –jobID <ID> –Type NodePrepJob add <ID> -Type:”NodePrep”
Add one or more primary tasks (Basic or Parametric Sweep) to the job.
Add a Node Release task.
Note
If you are using HPC PowerShell or a Command Prompt window, specify a task with Type set to NodeRelease.
Submit the job.
Now try to cancel a Running job that includes a Node Release task.
Expected results
The Node Preparation task runs on each node before the Basic or Parametric Sweep tasks.
If a Node Preparation task fails to run on a node, that node is not added to the job.
The Node Release task runs on each node as it is released from the job.
The Node Release task runs if the job is canceled or preempted.
Many of the applications that you run on your cluster run for a long time, and they consist of many internal stages. To better monitor job progress, you want to be able to see information about the percentage of completion or about the internal state of the application (such as data file loaded, running simulation, or writing data).
You can include commands in your application or script files to set and Mayntain custom job progress information with the Progress and Progress Message job properties.
Progress is an integer between 0-100 that represents the percentage of the job that is complete.
Progress Message is a string up to 80 characters that can display a custom status message.
Goal
Set and Mayntain values for job Progress and Progress Message from an application or script.
Requirements
A Windows HPC Server 2008 R2 cluster with at least one compute node.
User permissions on the cluster.
Steps
Include commands to set Progress and Progress Message in your scripts or applications. For example, if your application includes a loop that performs some work, you can update the progress properties at each iteration.
To set the Progress and Progress Message properties in a batch (.bat) file, an HPC PowerShell script (.ps1), or an application, you can use the %CCP_JOBID% environment variable to get the job ID of the current job, as follows:
In a .bat file, use the job modify command, for example:
You can view custom progress information in HPC Job Manager, HPC PowerShell, or a Command Prompt window.
By default, the HPC Job Scheduler Service sets and Mayntains the value for the Progress job property. The service does not continue to update Progress for a job if you provide a value for Progress through the command-line interface, HPC PowerShell, or the APIs.
Allow canceled tasks time to save state information or clean up before exiting
Scenario
When a running task is stopped during execution, you want to allow time for the application to save state information, write a log message, create or delete files, or for services to finish computation of their current service call. You can configure the amount of time, in seconds, to allow applications to exit gracefully by setting the Task Cancelation Grace Period cluster property. The default Task Cancelation Grace Period is 15 seconds.
Important
In Windows HPC Server 2008 R2, the HPC Node Manager Service stops a running task by sending a CTRL_BREAK signal to the application. To use the grace period, the application must process the CTRL_BREAK event. If the application does not process the event, the task exits immediately. For a service to use the grace period, it must process the ServiceContext.OnExiting event.
Goal
Allow tasks that are canceled time to perform cleanup or completion steps before exiting.
Requirements
A Windows HPC Server 2008 R2 cluster with at least one compute node.
Administrative permissions on the cluster.
Steps
Submit a job that runs an application that includes code to handle a CTRL_BREAK event.
Cancel the job while it is running.
Verify that the actions in the application’s CTRL_BREAK event handler were performed.
You can use one of the following methods to change the Task Cancellation Grace Period to 10 seconds:
Run HPC PowerShell as an administrator, and then type:
Set-HpcClusterProperty –TaskCancelGracePeriod 10
Open a Command Prompt window as an administrator, and then type:
cluscfg setparams TaskCancelGracePeriod=10
Expected results
Canceled tasks that do not handle CTRL_BREAK events exit immediately.
Canceled tasks that include code to handle CTRL_BREAK events can exit gracefully.