Submitting Jobs with Slurm
This document provides an introduction to submitting jobs on Engineering clusters using Slurm. It covers the basics of writing and submitting job scripts, interactive job submissions, and common Slurm parameters. Additionally, there is a section with example submission scripts for several popular programs.
Table of Contents
Job Submission Scripts
A job submission script is a Bash script that declares the resources needed for your job and contains the commands to execute your application. The script typically includes:
- Shebang Line: Specifies the shell to interpret the script.
- SBATCH Directives: Define job options like job name, output/error file locations, and resource requests.
- Commands: Load necessary modules and run your program.
Example Submission Script
#!/bin/bash
#SBATCH --job-name=myappjob
#SBATCH --output=screenout.txt
#SBATCH --error=screenerror.txt
#SBATCH --ntasks=2
module load mpi/openmpi-x86_64
./myprogram
Note: If you encounter an error such as:
sbatch: error: Batch job submission failed: Invalid account or account/partition combinationplease contact linux-engr-helpdesk@tamu.edu.
Submitting a Script
Once you have created your submission script (for example, named myscript.job), submit it to the Slurm scheduler using:
sbatch myscript.job
After submission, you will receive a job ID that you can use to monitor the job's progress.
Interactive Job Submissions
Not every job is well-suited for batch submission. For tasks that require interactive input or a graphical interface, you can start an interactive session using srun:
srun --pty /bin/bash
If you need to run GUI applications, include the --x11 flag:
srun --pty --x11 /bin/bash
Slurm Parameters
Below are some common Slurm parameters along with their descriptions and examples:
| Option | Example | Description |
|---|---|---|
--job-name | --job-name=myjob | Assigns a friendly name to your job. |
--output | --output=out.txt | Redirects standard output to the specified file. |
--error | --error=err.txt | Redirects standard error to the specified file. |
--ntasks | --ntasks=4 | Specifies the number of tasks (processes) required. |
--cpus-per-task | --cpus-per-task=20 | Specifies the number of CPUs per task (for multithreaded jobs). |
--partition | --partition=large | Specifies the partition to which the job should be submitted. |
--qos | --qos=normal | Specifies the Quality of Service (QOS) for the job. |
--mail-type | --mail-type=END,FAIL | Sets when email notifications are sent. |
--mail-user | --mail-user=user@domain.com | Specifies the email for notifications. |
For a complete list of options, see the Slurm SBATCH documentation.
Tasks versus CPUs
Slurm differentiates between tasks (processes) and CPUs (threads):
- Tasks (
--ntasks): Used for multi-process programs (e.g., MPI jobs). - CPUs per task (
--cpus-per-task): Used for multithreaded programs (e.g., MATLAB).
A task cannot span multiple compute nodes; if your job requires multiple CPUs on a single node, use --cpus-per-task.
Examples for Popular Programs
Below are sample job submission scripts for various popular applications:
1. MPI-Based Application
For running an MPI job, use multiple tasks to execute your program in parallel:
#!/bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --output=mpi_output.txt
#SBATCH --error=mpi_error.txt
#SBATCH --ntasks=8
module load mpi/openmpi-x86_64
mpirun ./mpi_application
2. Python Script
For running a Python script that may use multiple processes (for example, via the multiprocessing module):
#!/bin/bash
#SBATCH --job-name=python_job
#SBATCH --output=python_output.txt
#SBATCH --error=python_error.txt
#SBATCH --ntasks=4
module load python/3.8
python my_script.py
3. MATLAB Job
For MATLAB applications that use multithreading, specify the number of CPUs per task:
#!/bin/bash
#SBATCH --job-name=matlab_job
#SBATCH --output=matlab_output.txt
#SBATCH --error=matlab_error.txt
#SBATCH --cpus-per-task=4
module load matlab
matlab -nodisplay -r "run('my_matlab_script.m'); exit"
4. R Script
For submitting an R job:
#!/bin/bash
#SBATCH --job-name=r_job
#SBATCH --output=r_output.txt
#SBATCH --error=r_error.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
module load R
Rscript my_script.R
Tip: Adjust the resource parameters according to the specific requirements of your application.