Slurm Status Dashboard

While the command line (CLI) remains the primary method for interacting with the Slurm scheduler, we recognize that squeue and sinfo output can sometimes be difficult to parse quickly.

We use a software package called HPC Dashboard to provide a graphical interface to see the state of the cluster in real-time. To see the VACC Slurm Dashboard, you must be on the UVM campus network or be logged into the UVM VPN.

VACC HPC Dashboard

Notable Features¶

Queue Status

In the lower-right corner, you can see the number of jobs running on the cluster and the number pending in the queue.

Summary Node Status

In the System Activity section, the number of nodes in each possible state (e.g., Idle, Allocated) and the percentage of the cluster that each makes up can be quickly seen.

Visual Node Status

Quickly see which partitions are busy and which have availability via color-coded heat maps. This can be helpful for identifying free resources before submitting a job.

Job Efficiency Scoring

You can search for your historical Job IDs to view detailed statistics. The dashboard provides an efficiency score for CPU and Memory usage, helping you right-size your resource requests (--mem and --cpus-per-task) for future submissions.

Filters

View the cluster by node type (GPU, CPU), partition (general, nvgpu, etc.), feature type(AMD, INTEL, H200, etc.), and node states (idle, allocated, etc.).

Cluster Transparency

Click on individual nodes to see current load and running jobs. This visibility helps determine if a node is under heavy load or if specific resources (like GPUs, Memory, CPUs) are fully utilized.

Aggregate Metrics

The "Show Detail" toggle offers cluster-wide telemetry, including total CPU, GPU, and Memory utilization, as well as real-time cluster power consumption (kW).

The HPC Dashboard project, which is what slurmdash runs, is under active development. If you notice any bugs, have questions or feedback, please send a message to vacchelp@uvm.edu