We have made some improvements to the Slurm scheduler on the Balena HPC service. These changes will help free up resource for GPU jobs and provide more resources for jobs with short run times (<6 hours).

The new Slurm partition (or job queue) is called batch-short and can be used for submitting jobs which will require 6hours or less. This partition will have more Ivybridge nodes available to run these short wall time jobs across, which should help these jobs get through the batch system quicker.

The GPU and NVMe accelerator nodes have been removed from the standard Ivybridge batch partitions, which will prevent these accelerator nodes from being used by long-running CPU-only workloads, thus blocking these nodes for those who need the GPUs. Jobs which will require a GPU node (either NIVIDIA k20x or P100) these should be submit directly to the batch-acc partition with the appropriate SBATCH GPU options (see HPC Wiki - Slurm scheduler). The GPU partition, batch-acc, has been prioritised for jobs which require GPUs over those which don't.

Any questions about this please contact hpc-support@bath.ac.uk.

Posted in: Advancing Research Computing, High Performance Computing (HPC)

Respond

  • (we won't publish this)

Write a response