Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure CycleCloud Workspace for Slurm is a free Marketplace application that provides a simple, secure, and scalable way to manage compute and storage resources for HPC and AI workloads. In this quickstart, you install CycleCloud Workspace for Slurm using Azure Marketplace application.
The main steps to deploy and configure CycleCloud Workspace for Slurm including Open OnDemand are:
- Review these instructions before starting: Plan your CycleCloud Workspace for Slurm Deployment.
- Deploy a CycleCloud Workspace for Slurm environment using Azure Marketplace (this quickstart).
- Register a Microsoft Entra ID application for Open OnDemand authentication: Register a Microsoft Entra ID application for Open OnDemand.
- Configure Open OnDemand to use the Microsoft Entra ID application: Configure Open OnDemand with CycleCloud
- Add users in CycleCloud: Add users for Open OnDemand
Prerequisites
For this quickstart, you need:
- An Azure account with an active subscription
- The Contributor and User Access Administrator roles at the subscription level
- Direct connection to the virtual network used by the cluster (that is, not using Azure Bastion), if you need to deploy Open OnDemand
- Permission to register a Microsoft Entra ID application if you need to deploy Open OnDemand
How to deploy
- Sign in to the Azure portal.
- Select + Create a Resource.
- In the Search services and marketplace box, enter Slurm and then select Azure CycleCloud Workspace for Slurm.
- On the Azure CycleCloud Workspace for Slurm page, select Create.
Basics
- On the New Azure CycleCloud Workspace for Slurm account page, enter or select the following details.
- Subscription: Select the subscription to use if it's not already selected.
- Region: Select the Azure region where you want to deploy your CycleCloud Workspace for Slurm environment.
- Resource group: Select the resource group for the Azure CycleCloud Workspace for Slurm account, or create a new one.
- CycleCloud VM Size: Choose a new VM Size or keep the default one.
- Admin User: Enter a name and a password for the CycleCloud administrator account.
- Admin SSH Public Key: Select the public SSH key of the administrator account directly or if stored in an SSH key resource in Azure.
File-system
Users' home directory - Create new
Specify where to put the users' home directory.
Builtin NFS - Uses the scheduler VM as an NFS server with an attached datadisk.
Azure NetApp Files - Creates an ANF account, pool, and volume with the specified capacity and service level.
Users' home directory - Use Existing
If you have an existing NFS mount point, select the Use Existing option and specify the settings to mount it.
Supplemental file-system mount - Create new
If you need to mount another file system for your project data, you can either create a new one or specify an existing one. You can create a new Azure NetApp Files volume or an Azure Managed Lustre Filesystem.
Supplemental file-system mount - Use existing
If you have an existing external NFS mount point or an Azure Managed Lustre Filesystem, you can specify the mount options.
Networking
Specify if you want to create a new virtual network and subnets or use an existing one.
Create a new virtual network
- Select the CIDR that corresponds to the number of compute nodes you're targeting and specify a base IP address.
- Create a Bastion if your corporate IT doesn't provide direct connectivity.
- Create a NAT Gateway to provide outbound connectivity to the internet.
- Peer to an existing virtual network if you already have a HUB that can deliver services like Bastion and a VPN gateway. Ensure that you select a base IP address compatible with your peered virtual network. If the peered virtual network has a gateway, check the Allow gateway transit option.
Use existing virtual network
Before using an existing virtual network, check the prerequisites in Plan your CycleCloud Workspace for Slurm Deployment.
Specify how to manage the registration of the private endpoint used for the storage account to store CycleCloud projects with a private DNS zone. You can choose to create a new private DNS zone, use an existing one, or not register it.
Slurm settings
Specify the virtual machine size and image for the scheduler and the authentication nodes. The images are HPC images in Azure Marketplace with the following URIs:
Image Name | URI |
---|---|
Alma Linux 8.10 | almalinux:almalinux-hpc:8_10-hpc-gen2:latest |
Ubuntu 20.04 | microsoft-dsvm:ubuntu-hpc:2004:latest |
Ubuntu 22.04 | microsoft-dsvm:ubuntu-hpc:2204:latest |
Custom Image | You must specify an image URN or image ID |
If you choose a Custom Image
, specify an image URN for an existing marketplace image or an image ID for an image in an Azure Compute Gallery.
To use the same image for the scheduler, authentication nodes, and compute nodes, select Use image on all nodes.
Specify the number of authentication nodes you want to provision initially and the maximum number allowed. When you enable health checks, the solution automatically runs node health checks for the HPC and GPU partitions and removes any unhealthy nodes. You can delay the start of the cluster if you need to configure more settings through the CycleCloud portal.
To enable Slurm Job Accounting, check the box to display connectivity options. Make sure you have an Azure Database for MySQL flexible server resource that you deployed earlier.
You can connect using an FQDN or private IP if you supply your own virtual network. You can also use virtual network peering when you create a new virtual network as part of your deployment. If you choose to create a new virtual network, you can also connect through a private endpoint.
Partition settings
Azure CycleCloud Workspace for Slurm includes three defined Slurm partitions:
- HTC: For embarrassingly parallel non-MPI jobs.
- HPC: For tightly coupled MPI jobs that mostly use VM types with or without InfiniBand support.
- GPU: For MPI and non-MPI GPU jobs that use VM types with or without InfiniBand support.
You can set the image and the maximum number of nodes for each partition that CycleCloud dynamically creates. Only the HTC partition lets you use spot instances, because spot instances don't work well for HPC and GPU jobs.
Open OnDemand
To use Open OnDemand, select the checkbox and enter the following information:
- the image name,
- the domain name (
contoso.com
) that the system uses to get the user name ([email protected]
) and match it with the local Linux account (user
) that CycleCloud manages for authentication, - the fully qualified domain name (FQDN) of the Open OnDemand web server (leave blank if you want to use the private IP),
- whether you plan to use an existing Microsoft Entra ID application or register one manually later.
Automatically register Entra ID application
is an extra option that appears only when you use CLI deployment.
Note
User authentication requires a Microsoft Entra ID application. If our scripts don't create an application, manually create one. For more information, see How to register a Microsoft Entra ID application for Open OnDemand.
Advanced
You can enable availability zones for cluster compute nodes and new file-system resources. Placing compute nodes and storage in the same availability zone ensures minimal latency between them.
Tags
Assign the appropriate tags to the necessary resources. CycleCloud dynamically provisions virtual machines and applies Node Array tags to them.
Review and create
Review your options. This step also includes some validations.
When the validations are complete, select Create to initialize the deployment.
Follow the deployment status and steps.
Check your deployment
Connect to the ccw-cyclecloud-vm
using Bastion with the username and SSH keys you specify during the deployment.
After connecting, check the cloud-init logs to verify everything is correct.
$tail -f -n 25 /var/log/cloud-init-output.log
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Waiting for Azure.MachineType to be populated...
Starting cluster ccws....
----------------------------
ccws : allocation -> started
----------------------------
Resource group:
Cluster nodes:
scheduler: Off -- --
Total nodes: 1
CC start_cluster successful
/
exiting after install
Cloud-init v. 23.4-7.el8_10.alma.1 running 'modules:final' at Wed, 12 Jun 2024 10:15:53 +0000. Up 11.84 seconds.
Cloud-init v. 23.4-7.el8_10.alma.1 finished at Wed, 12 Jun 2024 10:28:15 +0000. Datasource DataSourceAzure [seed=/dev/sr0]. Up 754.29 seconds
Next, set up connectivity between your client machine and the CycleCloud VM. Your corporate IT department might need to help you set up connectivity through a VPN, Bastion tunneling, or an attached public IP if your company permits it. Access the web interface by browsing to https://<cyclecloud_ip>
. Sign in with the username and password you provide during deployment. Verify that both the scheduler and the sign-in node are running.
Resources
- Register a Microsoft Entra ID application for Open OnDemand
- Configure Open OnDemand with CycleCloud
- Add users for Open OnDemand
- How to connect to the CycleCloud Portal through Bastion
- How to connect to a Login Node through Bastion
- How to deploy a CycleCloud Workspace for Slurm environment using the CLI