We use a multifactor method of job scheduling on Navigator.
Job priority is assigned by a combination of fair-share, partition priority, and length of time a job has been sitting in the queue.
The priority of
the queue is the highest factor in the job priority calculation.For
certain queues this will cause jobs on lower priority queues which
overlap with that queue to be requeued.
The second most important factor is fair-share score. You can find a description of how SLURM calculates fair-share here.
The third most important is how long you have been sitting in the queue. The longer your job sits in the queue the higher its priority grows. If everyone’s priority is equal then FIFO is the scheduling method.
If you want to see what your current priority is just do
sprio -j <JOBID>
which will show you the calculation it does to figure out your job priority.
If you do
sshare -u <USERNAME>
you can see your current fair-share and usage.
We also have backfill turned on.
This allows for jobs which are smaller to sneak in while a larger higher priority job is waiting for nodes to free up.
If your job can run in the amount of time it takes for the other job to get all the nodes it needs, SLURM will schedule you to run during that period.
This means knowing how long your code will run for is very important and must be declared if you wish to leverage this feature.
Otherwise the scheduler will just assume you will use the maximum allowed time for the partition when you run.