Slurm is a resource manager and job scheduler designed for scheduling and allocating resources as per user job requirements. Slurm is an open source software originally created by the Livermore Computing Center.
provides information regarding resources that are available from server. Example :
[sorawid@slurmmaster ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
gpu* up infinite 2 idle ai[2-3]
From the output above
provides information regarding currently running jobs and the resources allocated to those jobs.
[sorawid@slurmmaster ~]# squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[sorawid@slurmmaster ~]# squeue -u root
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8 gpu bash root R 0:02 1 ai2
The output from squeue shows you the JobID, the type of partition, the name of the job, which user owns the job, the total elapsed time of the job, how many nodes are allocated to that job, and which nodes those are.
To cancel a job, you can use scancel <JOBID>
[sorawid@slurmmaster ~]$ scancel <JOBID>
where <JOBID>
refers to the JobID assigned to your job by Slurm.
scontrol show partition
[sorawid@slurmmaster ~]$ scontrol show partition
PartitionName=gpu
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=YES MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=ai[2-3]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=128 TotalNodes=2 SelectTypeParameters=NONE
JobDefaults=DefCpuPerGPU=1
DefMemPerCPU=512 MaxMemPerNode=UNLIMITED
TRES=cpu=128,mem=500000M,node=2,billing=128,gres/gpu=6
5.1 Show user’s QOS and DefaultQOS
sacctmgr show assoc format=account,user%15,qos%30,defaultqos%20
[sorawid@slurmmaster ~]$ sacctmgr show assoc format=account,user%15,qos%30,defaultqos%20
Account User QOS Def QOS
---------- --------------- ------------------------------ --------------------
root normal
root root normal
ext_users normal
g_gable normal
g_gable sorawid normal
ku_admins normal
ku_users normal
f_eng normal
f_sci normal
g_bio normal
g_bio peeranon normal
g_chem normal
g_phy normal
5.2 Show your QOS and DefaultQOS
sacctmgr show qos format=name%15,MaxTRESPU%35,MaxWall,MaxTRES,MaxJobsPU
[sorawid@slurmmaster ~]$ sacctmgr show qos format=name%15,GrpTRES,MaxTRESPU%35,MaxWall,MaxTRES,MaxJobsPU
Name GrpTRES MaxTRESPU MaxWall MaxTRES MaxJobsPU
--------------- ------------- ----------------------------------- ----------- ------------- ---------
normal cpu=8,gres/gpu=1,mem=64G
Each QOS is described in the following table:
QoS |
||||||
---|---|---|---|---|---|---|
No |
Name |
Max CPU (Cores)Per User |
Max RAM (GB)Per User |
Max GPU (unit)Per User |
Max Walltime Per User (Day-Hour:Min:Sec) |
Priority |
1 |
normal |
8 |
64 |
1 |
Unlimit |
- |
srun
for simple running job for testing.
[sorawid@slurmmaster ~]$ srun -c 4 -p gpu hostname
ai2.ku.io
[sorawid@slurmmaster ~]$ srun -c 4 -p gpu --gres=gpu:1 nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-6bddfd8d-6a4f-caac-25e8-751df27c24f6)
Using srun
with --pty bash
parameter for interactive job. Interactive job is usually for compile program in a short period of time (before a long run by sbatch
).
Example: I want to use cpu=4core, GPU=1unit
in my interactive job.
[sorawid@slurmmaster ~]$ srun -c 4 -p gpu --gres=gpu:1 --pty bash
[sorawid@ai2 ~]$ nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-6bddfd8d-6a4f-caac-25e8-751df27c24f6)
[sorawid@ai2 ~]$ hostname
ai2.ku.io
[sorawid@ai2 ~]$ exit
exit
[sorawid@slurmmaster ~]$
To submit a job in slurm you do so by submitting a shell script that outlines the resources you are requesting from the scheduler, the software needed for your job, and the commands you wish to run. The beginning of your submission scrip usually contains the #Hashbang specifying which interpreter should be used for the rest of the script, in this case we are using a bash shell as indicated by the code #!/bin/bash
. The next portion of your submission script tells Slurm what resources you are requesting and is always preceeded by #SBATCH
followed by flags for various parameters detailed below.
Directive | Description |
---|---|
#SBATCH --job-name=test |
Job name |
#SBATCH --output=output%j.out |
Standard output and error log |
#SBATCH --ntasks=4 |
Number Of tasks |
#SBATCH --time=00:01:00 |
Time limit hrs:min:sec |
#SBATCH --cpus-per-task=1 |
Number of cpu per tasks |
#SBATCH --gres=gpu:1 |
Number of gres resources e.g. GPU |
#SBATCH --mem=16GB |
Job memory request to use in node |
#SBATCH --partition=gpu |
Partition Name |
#SBATCH --test-only |
For test submit job, not run job |
Example of a Slurm submission script
[sorawid@slurmmaster ~]$ cat test.batch
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH -p gpu
#SBATCH -c 4
hostname
nvidia-smi -L
To submit the job you execute the sbatch command followed by the name of your submission script, for example:
[sorawid@slurmmaster ~]$ sbatch test.batch
Submitted batch job 66
Once you execute the above command the job is queued until the requested resources are available for to be allocated to your job.
[sorawid@slurmmaster ~]$ cat slurm-66.out
ai2.ku.io
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-6bddfd8d-6a4f-caac-25e8-751df27c24f6)
Two common solutions for creating parallel code are OpenMP and MPI. Both solutions are limited to the C++ or Fortran programming languages.
OpenMP (“Open Multi-Processing”) is a compiler-side application programming interface (API) for creating code that can run on a system of threads. it follows a shared memory model.
MPI (“Message Passing Interface”) is a library standard for handling parallel processing. Unlike OpenMP, MPI has much more flexibility in how individual processes handle memory. MPI is also compatible with multi-node structures, allowing for very large, multi-node applications (i.e, distributed memory models).
hellohybrid.c
file[sorawid@slurmmaster hybrid]$ cat hellohybrid.c
#include <stdio.h>
#include <omp.h>
#include "mpi.h"
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
int iam = 0, np = 1;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
#pragma omp parallel default(shared) private(iam, np)
{
np = omp_get_num_threads();
iam = omp_get_thread_num();
printf("Hello from thread %d out of %d from process %d out of %d on %s\\n",
iam+1, np, rank+1, numprocs, processor_name);
}
MPI_Finalize();
}
[sorawid@slurmmaster ~]$ srun --pty bash
[sorawid@ai2 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
9 normal bash sorawid R 0:01 1 ai2
[sorawid@ai2 ~]$ cd hybrid/
[sorawid@ai2 hybrid]$ module load mpi
Loading mpi version 2021.9.0
[sorawid@ai2 hybrid]$ mpicc -fopenmp hellohybrid.c -o hellohybrid
[sorawid@ai2 hybrid]$ ls
hellohybrid hellohybrid.c
[sorawid@ai2 hybrid]$
[sorawid@slurmmaster hybrid]$ cat hybrid.sbatch
#!/bin/bash
# A job submission script for running a hybrid MPI/OpenMP job on
# Midway2.
#SBATCH --job-name=hellohybrid
#SBATCH --output=hellohybrid.out
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=8
module load mpi
# Set OMP_NUM_THREADS to the number of CPUs per task we asked for.
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Run the process with mpirun. Note that the -n option is not required
# in this case; mpirun will automatically determine how many processes
# to run from the Slurm settings.
./hellohybrid
Hello from thread 1 out of 8 from process 1 out of 1 on ai2
Hello from thread 8 out of 8 from process 1 out of 1 on ai2
Hello from thread 4 out of 8 from process 1 out of 1 on ai2
Hello from thread 5 out of 8 from process 1 out of 1 on ai2
Hello from thread 7 out of 8 from process 1 out of 1 on ai2
Hello from thread 6 out of 8 from process 1 out of 1 on ai2
Hello from thread 3 out of 8 from process 1 out of 1 on ai2
Hello from thread 2 out of 8 from process 1 out of 1 on ai2
[sorawid@slurmmaster hybrid]$
sacct
- displays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database. This command use when you need monitor your past job.
[sorawid@slurmmaster ~]$ sacct -j 10 --format=JobName,Account,AllocCPUS,State,nodelist
JobName Account AllocCPUS State NodeList
---------- ---------- ---------- ---------- ---------------
hellohybr+ cans 32 COMPLETED solid
batch cans 32 COMPLETED solid
extern cans 32 COMPLETED solid
[sorawid@slurmmaster ~]$
-j
: job idJobName
: job nameAccount
: account to run this jobAllocCPUS
: amount allocate cpuState
: state of jobnodelist
: node name that run this jobThis brief tutorial should provide the basics necessary for submitting jobs to the Slurm Workload Manager on cluster.
💡 For more information pbs vs slurm : https://slurm.schedmd.com/rosetta.pdf