Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Before you deploy Azure CycleCloud in a production environment, you need to carefully plan your infrastructure, configuration, and operational processes. This article provides guidance on key decisions and requirements to ensure a successful and reliable CycleCloud deployment. It covers initial setup, application integration, data management, and disaster recovery.
Azure CycleCloud deployment
- Choose the version of CycleCloud to deploy:
- Prepare your Azure subscription by choosing the subscription, virtual network, subnet, and resource group for the CycleCloud server deployment
- Choose the resource group to host clusters or let CycleCloud create the resource group (default setting)
- Create a storage account for locker access
- Decide if you want to use SSH keys, Microsoft Entra ID, or LDAP for authentication
- Decide if CycleCloud should use a Service Principal or a Managed Identity (recommended with a single subscription) Choosing between a Service Principal and a Managed Identity
- Confirm which SKU to use for CycleCloud: CycleCloud System Requirements
- Decide if you want to deploy the environment in a locked down network. If so, consider the following requirements: Operating in a locked down network
- Deploy the CycleCloud server
Warning
Don't set "Enable hierarchical namespace" for Azure Data Lake Storage Gen 2 during storage account creation. CycleCloud can't use Blob storage with ADLS Gen 2 enabled as a storage Locker.
Azure CycleCloud Configuration
- Sign in to the CycleCloud server and create a site and a CycleCloud admin account: CycleCloud Setup
- Create CycleCloud locker that points to the storage account
Azure CycleCloud cluster configuration
- Define user access to the clusters Cluster User Management
- Choose the scheduler to use
- Choose the version for the scheduler and head node
- Choose the versions for the compute and execute nodes. This choice depends entirely on the application you're running.
- Decide whether to deploy clusters using a template or manually:
- Define and upload cluster templates to the locker: Cluster Template Reference
- Manually create a cluster: Create a New Cluster
- Decide if you need to run any scripts on the scheduler or execute nodes once deployed:
Applications
- What dependencies (libraries, and so on) do the applications have? How will you make these dependencies available?
- How long does it take to set up and install an application? This factor might determine how you make the application available to the execution nodes. It might also require a custom image.
- Are there any license dependencies that you need to consider? Does the application need to contact an on-premises license server?
- Where will you execute the applications? This choice depends on install times and performance requirements:
- Through a custom image:
- Using a marketplace image
- From an NFS share, blob storage, Azure NetApp Files
- Is there a specific VM version you need to use for the applications to run on? Is MPI a requirement? If it is, you'll need a different family of machines, like the H series.
- What's the best number of cores per job for each application?
- Can you use spot VMs? Using Spot VMs in CycleCloud
- Make sure you have the right subscription quotas to meet the core requirements for the applications.
Data
- Determine where in Azure the input data resides. This determination depends on the performance of the applications and data size.
- Locally on the execute nodes
- From an NFS share
- In blob storage
- Using Azure NetApp Files
- Determine if there's any post-processing needed on the output data
- Decide where the output data resides once processing is complete
- Decide if the output data needs to be copied elsewhere
- Determine archive and backup requirements
Job Submission
- How do users submit jobs?
- Do users have a script to run on the scheduler VM, or is there a frontend to help with data upload and job submission?
Backup and disaster recovery
- Will you use templates for cluster creation? Using templates makes recreating a CycleCloud server faster and keeps deployments consistent.
- What are your disaster recovery requirements? What would happen to your business if an Azure region wasn't available when you expected?
- Did your internal business define any application SLAs?
- Can you use another region as a standby?
- Are your jobs long running? Would checkpointing help?