Queue and Priority
Jobs in WATGPU are ran depending on their priority. Higher priority jobs go faster in the queue and can sometimes requeue running jobs to free resources for themselves.
To work with this priority, WATGPU resources have been separated into several partitions. The partition on which a job is running defines its possible allocated resources and its priority. Selecting a partition to run a job is easy:
- For interactive sessions: simply add
--partition=<PARTITION>
argument in thesalloc
command. - For batch jobs: add
#SBATCH --partition=<PARTITION>
in your.sh
script
Following is a presentation of the different partitions available in WATGPU:
ALL
This is the default partition. A job submitted to ALL
will be able to run on any available resources on WATGPU.
Jobs will have standard priority for the queue and might be preempted and requeued if a higher priority job is in need of resources.
All users have access to this partition:
#SBATCH --partition=ALL
SCHOOL
A job submitted to SCHOOL
will be allocated only to GPUs owned by the School. While fewer resources are available in SCHOOL
than in ALL
, the chances of getting preempted by higher priority jobs are reduced.
Jobs will have standard priority for the queue and might be preempted and requeued if a higher priority job is in need of resources.
All users within the School have access to this partition:
#SBATCH --partition=SCHOOL
<GROUP>
When a group <GROUP> has entrusted their GPU(s) to WATGPU, they have access to the partition named <GROUP>
linked to their contributed GPU(s).
Jobs will have high priority for the queue and will preempt and requeue lower priority jobs (from ALL
and SCHOOL
) if there are not enough resources available with a limit of GPU(s) up to their contribution.
All users from a group have access to their group partition:
#SBATCH --partition=<GROUP>
Users using their group's partition to request GPUs must now specify the group name for the GPU request in their jobs:
#SBATCH --gres=gpu:<GROUP_name>gpu:1