Slurm scripts are essential for submitting and managing jobs in a high-performance computing (HPC) environment. Slurm (Simple Linux Utility for Resource Management) is a widely-used, open-source workload manager that helps allocate resources and schedule jobs on HPC clusters.
Basic Structure of a Slurm Script
A Slurm script is a Bash script that starts with #SBATCH directives to specify job parameters for the Slurm scheduler.
######################################
Example Slurm Script for ASL-cpu Node
######################################
#!/bin/bash
#SBATCH --job-name=cpu_test # Job name
#SBATCH --output=%x_%j.out # Standard output file
#SBATCH --error=%x_%j.err # Standard error file
#SBATCH --partition=ASL-cpu # Partition name for CPU jobs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 # Number of tasks
#SBATCH --cpus-per-task=1 # Number of CPU cores per task
#SBATCH --mem=1G # Memory allocation
#SBATCH --time=00:10:00 # Maximum runtime (HH:MM:SS)
sleep 60
echo "Hello World"
echo "Hello Error" 1>&2
#######################################
Example Slurm Script for ASL-gpu Node
#######################################
#!/bin/bash
#SBATCH --job-name=gpu_test # Job name
#SBATCH --output=%x_%j.out # Standard output file
#SBATCH --error=%x_%j.err # Standard error file
#SBATCH --partition=ASL-gpu # Partition name for GPU jobs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 # Number of tasks
#SBATCH --cpus-per-task=4 # Number of CPU cores per task
#SBATCH --gres=gpu:1 # Number of GPUs needed
#SBATCH --time=00:30:00 # Maximum runtime (HH:MM:SS)
# Load any necessary modules
# module load cuda/11.8
# Run your GPU job commands here
python my_gpu_script.py
Explanation of Key Directives
#SBATCH: Specifies options for the job submission.--job-name: A custom name for your job.--outputand--error: Specify the file paths for job output and error logs.--partition: Indicates the partition or queue to use (e.g.,ASL-cpuorASL-gpu).--nodes: Number of nodes needed.--ntasks: Number of tasks.--cpus-per-task: CPU cores allocated per task.--gres: Specifies generic resources, such as GPUs.--time: Sets the maximum runtime for the job.--mem: Memory allocation for the job.
Submitting Your Slurm Script
Save your Slurm script as a file (e.g., my_job.slurm) and submit it using:
sbatch my_job.slurmJob Arrays in Slurm
To run a job array, use the --array directive:
#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --time=2:00:00
#SBATCH --array=1-10 # Task ID range
#SBATCH --ntasks=1 # One task per job
#SBATCH --partition=ASL-cpu
# Command to run
python my_script.py $SLURM_ARRAY_TASK_IDExplanation: The job array runs 10 separate jobs, each with a different SLURM_ARRAY_TASK_ID.