Navigator computers run the CentOS distribution of the Linux operating system and commands are run under the "bash" shell. There are a number of Linux and bash references, cheat sheets and tutorials available on the web.
There are some modifications to the default installation to meet all the computational needs we support.
If you get in trouble compiling or running your code, feel free to contact us. When asking for help, it is very important to provide us enough information about the environment where the problem occurred, logs, and conditions about where and how you use to make the same operations.
Because of the diversity of investigations currently supported by LCA, many applications and libraries are supported on the Navigator cluster. Technically, it is impossible to include all of these tools in every user's environment. The Linux module system is used to enable subsets of these tools for a particular user's computational needs.
Navigator has installed the gcc and gfortran default version for the release of installed CentOS - GCC- 4.8.5. Since this is an old version of the compiler, Navigator also provide users the ability to use gcc (and gfortran) - GCC 5.4.0 and GCC 8.3.0 through a environment modules.
As many users expect, Navigator also provides a recent version of the Intel Compiler and the algebra library MKL - Intel Parallel Studio XE Cluster Edition 2019 update 3 (184.108.40.206). This can be used through an environment module.
SLURM is a queue management system and stands for Simple Linux Utility for Resource Management. SLURM was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world.
SLURM is similar in many ways to TORQUE or most other queue systems. You must write a batch script then submit it to the queue manager. The queue manager then schedules your job to run on the partition (or queue in TORQUE) that you designate. Below we will provide an outline of how to submit jobs to SLURM, how SLURM decides when to schedule your job and how to monitor progress.
SLURM has a number of features that make it more suited to our environment than TORQUE:
The primary source for documentation on SLURM usage and commands can be found at the SLURM site. If you Google for SLURM questions, you'll often see the Lawrence Livermore pages as the top hits, but these tend to be outdated.A great way to get details on the SLURM commands is the man pages available from the Navigator cluster login node. For example, if you type the following command:
you'll get the manual page for the sbatch command.
Since most people is familiar to the TORQUE queue management system, we provide a small group of examples using SLURM and the correspondent TORQUE commands.
|Submit a batch serial job||sbatch||qsub|
|Kill a job||scancel||qdel|
|Check current job by id||sacct||checkjob|
sacct -j <JOBID>
|View status of queues||squeue||qstat|
|View information about nodes and partitions||sinfo||showq|
sinfo -N; sinfo --long