Migrating from PBS to SLURM

This fall semester of 2021, the educational cluster batch scheduler will be the Slurm Workload Manager.

This guide provides information on how you can migrate your scripts and jobs from PBS to Slurm, if need be. There are two main aspects involved in the migration, learning the new commands for job submission and job script conversion. The concepts are the same in both schedulers, but the syntax of the commands, directives, and environment variables differ.

Equivalent Slurm commands exist for those commonly used in PBS, with the command names and options detailed in the following table.

Command Comparison

CommandPBS (Torque/Moab)Slurm
Submit a Jobqsub [job-submit-script]sbatch [job-submit-script]
Delete a Jobqdel [job-id]scancel [job-id]
Queue Listqstatsqueue
Queue Infoqstat -q [queue]scontrol show partition [partition]
Node Listpbsnodes -a [:queue]scontrol show nodes
Node detailspbsnodes [node]scontrol show node [node]
Job status (by job)qstat [job-id]squeue -j [job-id]
Job status (by user)qstat -u [user]squeue -u [user]
Job status (detailed)qstat -f [job-id]scontrol show job -d [job-id]
Show expected start timeshowstart [job-id]squeue -j [job-id] –start

For a comprehensive list of Slurm commands, please download this Command Reference PDF on SchedMD’s Website.

Existing PBS batch scripts can be readily migrated for use on the Slurm resource manager, with some minor changes to the directives and referenced environment variables. The more popular Slurm equivalent directives and environment variables are outlined below.

Directive Comparison

DirectivePBS (Torque/Moab)Slurm
Script directive#PBS#SBATCH
Job name-N [name]–job-name=[name]
Queue / Partition-q [queue]–partition=[queue]
Wall time limit-l walltime=[hh:mm:ss]–time=[hh:mm:ss]
Node count-l nodes=[count]–nodes=[count]
CPU count per node-l ppn=[count]–ntasks-per-node=[count]
Memory size-l mem=[limit] (*per job)–mem=[limit] (*per node)
Memory per CPU-l pmem=[limit]–mem-per-cpu=[limit]
Standard output file-o [filename]–output=[filename]
Standard error file-e [filename]–error=[filename]
Combine stdout/stderr-j oe (to stdout)(default)
Copy environment-V–export=ALL (default)
Copy env variable-v [var]–export=var
Job dependency-W depend=[state:jobid]–dependency=[state:jobid]
Event notification-m abe–mail-type=[events]
Email address-M [address]–mail-user=[address]

For a full list of directives, please consult SchedMD’s sbatch Webpage.

Environment Variable Comparison

DescriptionPBS (Torque/Moab)Slurm
Job Name$PBS_JOBNAME$Slurm_JOB_NAME
Job ID$PBS_JOBID$Slurm_JOB_ID
Submit Directory$PBS_O_WORKDIR$Slurm_SUBMIT_DIR
Submit Host$PBS_O_HOST$Slurm_SUBMIT_HOST
Node Listcat $PBS_NODEFILE$Slurm_JOB_NODELIST
Job Array Index$PBS_ARRAYID$Slurm_ARRAY_TASK_ID
Queue Name$PBS_QUEUE$Slurm_JOB_PARTITION
Number of Nodes$PBS_NUM_NODES$Slurm_NNODES
Number of Procs$PBS_NP$Slurm_NTASKS
Procs per Node$PBS_NUM_PPN$Slurm_CPUS_ON_NODE

For a full list of environmnet variables, please consult the Environment Variables Section on SchedMD’s sbatch Webpage.

Tips on Converting Submit Scripts

We have created a utility called “p2s” (for PBS-to-Slurm), which is available on our Interactive/Submit hosts in your standard $PATH. Simply pass it the name of the script you would like to convert (give full-path if you are not in same directory of the script), and it will convert it from a PBS to a Slurm submit script. For example, to convert a PBS submit script named “submit-script.pbs” issue the following command on the Interactive/Submit host:

p2s submit-script.pbs

This will output the changes directly to STDOUT, so that you can view them right in your SSH session. If everything looks good and you would like to save the converted script to a new file, simply redirect the output to a file name of your choice (we recommend using a NEW file name, not the existing file name):

p2s submit-script.pbs > submit-script.slurm

There are many other conversion scripts available online to download and use, and you are welcome to download and try them out in our environment. or you can manually convert your scripts given the directives and environmnet variables listed above. Once converted, you may have to make some small tweaks or edits in order to get it fully ready for submitting to Slurm.

We also have a collection of example Slurm submit scripts that you can copy and use as a template: /apps/slurm/examples

Environment Modules: Check your “module load” lines

While you’re converting your scripts from PBS to Slurm, you may also need to check your “module load” commands. Many of the scientific applications have been upgraded from prevoius semesters, so the versions available of any given app may no longer be available this semester. Please verify your application(s) *and* version(s) exist in on the cluster prior to job submission. Tensorflow is a great example. If you have a tensorflow v1.14 submit script from last semester, it will not work this semester because we have retired that version and only have v2.2.0 and v2.4.1:

# Spring 2021
$ module avail tensorflow
------------------------ /apps/usr/slurm/modules/apps ------------------------------
tensorflow/1.14-anaconda3-cuda10.0         tensorflow/2.2-anaconda3-cuda10.2
tensorflow/2.0-anaconda3-cuda10.0(default)
# Fall 2021
$ module avail tensorflow
-------------------------- /apps/usr/modules/apps ----------------------------------
tensorflow/2.2.0-cuda10.2  tensorflow/2.4.1-cuda11.2(default)

So if you submit a tensorflow job that has the command “module load tensorflow/1.14-anaconda3-cuda10.0” you will get the following error in your output script:

ERROR: Unable to locate a modulefile for 'tensorflow/1.14-anaconda3-cuda10.0'

In order to submit the job and have it successfully processed, you must update the “module load” line to use one of the current tensorflow versions, for example:

module load tensorflow/2.4.1-cuda11.2

And the best way to verify that modules will load prior to job submission is, for each “module load …” command in your submit script, execute it interactively, at the command line on the submit host. If you get an error running the “module load” line interactively, you will get an error using that same “module load” line in a compute job.

For more information on Environment Modules, please check out our FAQ.

For more information about the Slurm Workload Manager, please check out the Slurm Documentation on SchedMD’s Website.