Job Priority – Fair-share

The Slurm job schedule uses a multi-factor system to calculate a job’s priority in the queue to ensure equitable and fair access to the cluster’s shared resources. It’s not a FIFO (first in, first out) system. Instead, the the job owner’s recent activity on the cluster (fair-share value), the resources the job needs and how long the job needs to run for are all weighed proportionately to set an overall priority value for the job that determines its position in the queue. This prioritized queue is the augmented by the backfill plugin, which tries to find gaps in resource allocations where smaller, shorter jobs can squeue in without delaying the expected start time of the other jobs in the queue.

*If you’re looking to give your job the best chance of being scheduled as soon as possible, you should request just a little more time and resources than your job will need. It’s very important that your jobs are submitted requesting only slightly more time than they need, if you’re hoping to have the backfill plugin find a gap where you job might fit and get run a little early.

The more jobs you submit within a 1-2 week period the lower your fair-share value will be relative to the “billing” weight of the jobs that were submitted. If few or no jobs are submitted, your fair-share value will go back up over time. Each pending job’s priority is recalculated a few times per hour to reflect the job owner’s fair-share value changes. And any jobs pending in the queue will accrue an age factor that raises the jobs priority the longer it remains in a pending state.

Job Priority and Fairshare Breakdown

1. Priority Key Factors

The priority of a job is determined by several sub-factors, including:

  • Fair-Share Factor: This reflects the user’s share of resources relative to their historical usage.
  • Age Factor: This gives higher priority to jobs that have been waiting in the queue longer.
  • Job Size Factor: This considers the number of nodes or processors a job requests.
  • Partition Factor: This takes into account the partition (or queue) the job is submitted to, which may have different priorities.

2. Priority Formula

SLURM uses the following formula to calculate the priority for each job:

Job_priority = 
(PriorityWeightAge)       * (age_factor)        +
(PriorityWeightFairshare) * (fair-share_factor) + 
(PriorityWeightJobSize)   * (job_size_factor)   + 
(PriorityWeightPartition) * (partition_factor)  + 
(PriorityWeightQOS)       * (QOS_factor)        +
(possibly some more advanced factors that are not relevant for Andromeda)

All the factors in these formulas are floating point numbers between 0.0 and 1.0, while the weights are integer values that determine how important these factors should be considered.

3. Job Priority Calculation on Andromeda

On Andromeda, we use the following weights in our SLURM configuration:

[johnchris@l001 ~]$ sprio -w

JOBID PARTITION   PRIORITY       SITE        AGE      FAIRSHARE    JOBSIZE    PARTITION
                   Weights          1     500000    1000000       250000       1000
PriorityWeightAge       = 500000
PriorityWeightFairShare = 1000000
PriorityWeightJobSize   = 250000
PriorityWeightPartition = 1000

This means that the priority of a job is mainly determined by Fairshare, waiting age, and job size, with a smaller influence from the node partition.

SLURM job priorities can be queried using the sprio utility. Below, the -S ‘-Y’ option sorts by priority in descending order:

[johnchris@l001 ~]$ sprio -S '-Y'

          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE  PARTITION
        2458888 shared       1045915          0     500000     544348        568       1000
        2486756 exclusive     536587          0     500000      22609      12979       1000
        2486757 exclusive     536587          0     500000      22609      12979       1000
        2486759 exclusive     536587          0     500000      22609      12979       1000
        2486760 exclusive     536587          0     500000      22609      12979       1000
        2486761 exclusive     536587          0     500000      22609      12979       1000
        2486818 exclusive     534228          0     500000      22609      10619       1000

We can see that Job ID 2458888 has the highest priority (1045915 = 500000 (AGE) + 544348 (Fairshare) + 568 (JOBSIZE) + 1000 (PARTITION)).

4. Backfill

In addition to the main scheduling cycle, jobs are run based on priority and resource availability. All jobs are also considered for “backfill.” Backfill allows lower-priority jobs to start before high-priority jobs if they can fit around them. For example, SLURM will run a low-priority job if it just needs a couple of cores for an hour, while a high-priority job that needs 20 nodes with 48 cores each will have to wait 26 hours for those resources to become available.

5. Fairshare 

Fairshare is a scheduling policy designed to ensure equitable access to computing resources among users or groups. It balances resource allocation based on historical usage and predefined priorities.

Our cluster system is not a FIFO (first in, first service) system. Instead, it ensures equal access to computing resources for all groups and users over time, providing fair resources to all users.

So be careful when running jobs and remember to only request what you need (i.e., CPU cores, memory, GPUs, and time). Allocating excess cluster resources will negatively affect the job priority of your subsequent jobs.

For more information, please visit the following links.

https://slurm.schedmd.com/classic_fair_share.html

https://slurm.schedmd.com/SLUG19/Priority_and_Fair_Trees.pdf

Scroll to Top