Slurm scripts are essential for submitting and managing jobs in a high-performance computing (HPC) environment. Slurm (Simple Linux Utility for Resource Management) is a widely-used, open-source workload manager that helps allocate resources and schedule jobs on HPC clusters.
Basic Structure of a Slurm Script
A Slurm script is a Bash script that starts with #SBATCH
directives to specify job parameters for the Slurm scheduler.
######################################
Example Slurm Script for ASL-cpu Node
######################################
#!/bin/bash
#SBATCH --job-name=cpu_test # Job name
#SBATCH --output=%x_%j.out # Standard output file
#SBATCH --error=%x_%j.err # Standard error file
#SBATCH --partition=ASL-cpu # Partition name for CPU jobs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 # Number of tasks
#SBATCH --cpus-per-task=1 # Number of CPU cores per task
#SBATCH --mem=1G # Memory allocation
#SBATCH --time=00:10:00 # Maximum runtime (HH:MM:SS)
sleep 60
echo "Hello World"
echo "Hello Error" 1>&2
#######################################
Example Slurm Script for ASL-gpu Node
#######################################
#!/bin/bash
#SBATCH --job-name=gpu_test # Job name
#SBATCH --output=%x_%j.out # Standard output file
#SBATCH --error=%x_%j.err # Standard error file
#SBATCH --partition=ASL-gpu # Partition name for GPU jobs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 # Number of tasks
#SBATCH --cpus-per-task=4 # Number of CPU cores per task
#SBATCH --gres=gpu:1 # Number of GPUs needed
#SBATCH --time=00:30:00 # Maximum runtime (HH:MM:SS)
# Load any necessary modules
# module load cuda/11.8
# Run your GPU job commands here
python my_gpu_script.py
Explanation of Key Directives
#SBATCH
: Specifies options for the job submission.--job-name
: A custom name for your job.--output
and--error
: Specify the file paths for job output and error logs.--partition
: Indicates the partition or queue to use (e.g.,ASL-cpu
orASL-gpu
).--nodes
: Number of nodes needed.--ntasks
: Number of tasks.--cpus-per-task
: CPU cores allocated per task.--gres
: Specifies generic resources, such as GPUs.--time
: Sets the maximum runtime for the job.--mem
: Memory allocation for the job.
Submitting Your Slurm Script
Save your Slurm script as a file (e.g., my_job.slurm
) and submit it using:
sbatch my_job.slurm
Job Arrays in Slurm
To run a job array, use the --array
directive:
#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --time=2:00:00
#SBATCH --array=1-10 # Task ID range
#SBATCH --ntasks=1 # One task per job
#SBATCH --partition=ASL-cpu
# Command to run
python my_script.py $SLURM_ARRAY_TASK_ID
Explanation: The job array runs 10 separate jobs, each with a different SLURM_ARRAY_TASK_ID
.