HTCondor

2025-06-10

You can enable HTCondor on a CycleCloud cluster by modifying the run_list in the configuration section of your cluster definition. There are three basic components of an HTCondor cluster. The first is the central manager, which provides the scheduling and management daemons. The second component is one or more schedulers, from which jobs are submitted into the system. The final component is one or more execute nodes, which are the hosts that perform the computation. A simple HTCondor template might look like:

[cluster htcondor]

  [[node manager]]
  ImageName = cycle.image.centos7
  MachineType = Standard_A4 # 8 cores

      [[[configuration]]]
      run_list = role[central_manager]

  [[node scheduler]]
  ImageName = cycle.image.centos7
  MachineType = Standard_A4 # 8 cores

      [[[configuration]]]
      run_list = role[condor_scheduler_role],role[filer_role],role[scheduler]

  [[nodearray execute]]
  ImageName = cycle.image.centos7
  MachineType = Standard_A1 # 1 core
  Count = 1

      [[[configuration]]]
      run_list = role[usc_execute]

When you import and start a cluster with this definition in CycleCloud, you get a manager and a scheduler node, and one execute node. You can add execute nodes to the cluster by using the cyclecloud add_node command. To add 10 more execute nodes, use the following command:

cyclecloud add_node htcondor -t execute -c 10

HTCondor Autoscaling

CycleCloud supports autoscaling for HTCondor. The software monitors the status of your queue and turns on and off nodes as needed to complete the work in an optimal amount of time and cost. To enable autoscaling for HTCondor, add Autoscale=true to your cluster definition:

[cluster htcondor]
Autoscale = True

HTCondor Advanced Usage

If you know the average runtime of jobs, define average_runtime (in minutes) in your job. CycleCloud uses that value to start the minimum number of nodes. For example, if five 10-minute jobs are submitted and average_runtime is set to 10, CycleCloud starts only one node instead of five.

Autoscale Nodearray

By default, HTCondor requests cores from the nodearray called execute. If a job requires a different nodearray (for example, if certain jobs within a workflow have a high memory requirement), specify a slot_type attribute for the job. For example, adding +slot_type = "highmemory" causes HTCondor to request a node from the highmemory nodearray instead of execute (this setting currently requires htcondor.slot_type = "highmemory" to be set in the nodearray's [[[configuration]]] section). This setting doesn't affect how HTCondor schedules the jobs, so you might want to include the slot_type startd attribute in the job's requirements or rank expressions. For example: Requirements = target.slot_type = "highmemory".

Submitting Jobs to HTCondor

The most generic way to submit jobs to an HTCondor scheduler is the command (run from a scheduler node):

condor_submit my_job.submit

A sample submit file might look like this:

      Universe = vanilla
      Executable = do_science
      Arguments = -v --win-prize=true
      Output = log/$(Cluster).$(Process).out
      Error = log/$(Cluster).$(Process).err
      Should_transfer_files = if_needed
      When_to_transfer_output = On_exit
      +average_runtime = 10
      +slot_type = "highmemory"
      Queue

HTCondor Configuration Reference

The following HTCondor-specific configuration options customize functionality:

HTCondor-Specific Configuration Options	Description
htcondor.agent_enabled	If true, use the condor_agent for job submission and polling. Default: false
htcondor.agent_version	The version of the condor_agent to use. Default: 1.27
htcondor.classad_lifetime	The default lifetime of classads (in seconds). Default: 700
htcondor.condor_owner	The Linux account that owns the HTCondor scaledown scripts. Default: root
htcondor.condor_group	The Linux group that owns the HTCondor scaledown scripts. Default: root
htcondor.data_dir	The directory for logs, spool directories, execute directories, and local config file. Default: /mnt/condor_data (Linux), C:\All Services\condor_local (Windows)
htcondor.ignore_hyperthreads	(Windows only) Set the number of CPUs to half of the detected CPUs to "disable" hyperthreading. If using autoscale, specify the non-hyperthread core count with the `Cores` configuration setting in the [[node]] or [[nodearray]] section. Default: false
htcondor.install_dir	The directory that HTCondor is installed to. Default: /opt/condor (Linux), C:\condor (Windows)
htcondor.job_start_count	The number of jobs a schedd starts per cycle. 0 is unlimited. Default: 20
htcondor.job_start_delay	The number of seconds between each job start interval. 0 is immediate. Default: 1
htcondor.max_history_log	The maximum size of the job history file in bytes. Default: 20971520
htcondor.max_history_rotations	The maximum number of job history files to keep. Default: 20
htcondor.negotiator_cycle_delay	The minimum number of seconds before a new negotiator cycle can start. Default: 20
htcondor.negotiator_interval	How often (in seconds) the condor_negotiator starts a negotiation cycle. Default: 60
htcondor.negotiator_inform_startd	If true, the negotiator informs the startd when it matches to a job. Default: true
htcondor.remove_stopped_nodes	If true, stopped execute nodes are removed from the CycleServer view instead of being marked as "down".
htcondor.running	If true, HTCondor collector and negotiator daemons run on the central manager. Otherwise, only the condor_master runs. Default: true
htcondor.scheduler_dual	If true, schedulers run two schedds. Default: true
htcondor.single_slot	If true, treats the machine as a single slot (regardless of the number of cores the machine possesses). Default: false
htcondor.slot_type	Defines the slot_type of a node array for autoscaling. Default: execute
htcondor.update_interval	The interval (in seconds) for the startd to publish an update to the collector. Default: 240
htcondor.use_cache_config	If true, use cache_config to have the instance poll CycleServer for configuration. Default: false
htcondor.version	The version of HTCondor to install. Default: 8.2.6

HTCondor Auto-Generated Configuration File

HTCondor has a large number of configuration settings, including user-defined attributes. CycleCloud offers the ability to create a custom configuration file using attributes defined in the cluster:

Attribute	Description
htcondor.custom_config.enabled	If true, a configuration file is generated using the specified attributes. Default: false
htcondor.custom_config.file_name	The name of the file (placed in `htcondor.data_dir`/config) to write. Default: ZZZ-custom_config.txt
htcondor.custom_config.settings	The attributes to write to the custom config file such as `htcondor.custom_config.settings.max_jobs_running = 5000`

Note

You can't specify HTCondor configuration attributes containing a . using this method. If you need such attributes, specify them in a cookbook or a file installed with cluster-init.

CycleCloud supports a standard set of autostop attributes across schedulers:

Attribute	Description
cyclecloud.cluster.autoscale.stop_enabled	Enables autostop on this node. [true/false]
cyclecloud.cluster.autoscale.idle_time_after_jobs	The amount of time (in seconds) for a node to sit idle after completing jobs before it autostops.
cyclecloud.cluster.autoscale.idle_time_before_jobs	The amount of time (in seconds) for a node to sit idle before completing jobs before it autostops.