Edit

Share via


Plan your CycleCloud production deployment

Before you deploy Azure CycleCloud in a production environment, you need to carefully plan your infrastructure, configuration, and operational processes. This article provides guidance on key decisions and requirements to ensure a successful and reliable CycleCloud deployment. It covers initial setup, application integration, data management, and disaster recovery.

Azure CycleCloud deployment

Warning

Don't set "Enable hierarchical namespace" for Azure Data Lake Storage Gen 2 during storage account creation. CycleCloud can't use Blob storage with ADLS Gen 2 enabled as a storage Locker.

Azure CycleCloud Configuration

Azure CycleCloud cluster configuration

  • Define user access to the clusters Cluster User Management
  • Choose the scheduler to use
  • Choose the version for the scheduler and head node
  • Choose the versions for the compute and execute nodes. This choice depends entirely on the application you're running.
  • Decide whether to deploy clusters using a template or manually:
  • Decide if you need to run any scripts on the scheduler or execute nodes once deployed:

Applications

  • What dependencies (libraries, and so on) do the applications have? How will you make these dependencies available?
  • How long does it take to set up and install an application? This factor might determine how you make the application available to the execution nodes. It might also require a custom image.
  • Are there any license dependencies that you need to consider? Does the application need to contact an on-premises license server?
  • Where will you execute the applications? This choice depends on install times and performance requirements:
  • Is there a specific VM version you need to use for the applications to run on? Is MPI a requirement? If it is, you'll need a different family of machines, like the H series.
  • What's the best number of cores per job for each application?
  • Can you use spot VMs? Using Spot VMs in CycleCloud
  • Make sure you have the right subscription quotas to meet the core requirements for the applications.

Data

  • Determine where in Azure the input data resides. This determination depends on the performance of the applications and data size.
    • Locally on the execute nodes
    • From an NFS share
    • In blob storage
    • Using Azure NetApp Files
  • Determine if there's any post-processing needed on the output data
  • Decide where the output data resides once processing is complete
  • Decide if the output data needs to be copied elsewhere
  • Determine archive and backup requirements

Job Submission

  • How do users submit jobs?
  • Do users have a script to run on the scheduler VM, or is there a frontend to help with data upload and job submission?

Backup and disaster recovery

  • Will you use templates for cluster creation? Using templates makes recreating a CycleCloud server faster and keeps deployments consistent.
  • What are your disaster recovery requirements? What would happen to your business if an Azure region wasn't available when you expected?
  • Did your internal business define any application SLAs?
  • Can you use another region as a standby?
  • Are your jobs long running? Would checkpointing help?