User Tools

Site Tools


skill-tree:adm:3:b

ADM3 Cluster Management

Learn about managing an HPC cluster and what software can be used for aspects such as resource management, system provisioning, and monitoring.

Requirements

Learning Outcomes

  • Understand management tools such as xCAT and Clustore
  • Understand workload manager such as SLURM, SGE, or HTConda
  • Understand parallel shell programs such as Clustershell
  • Understand Stateful and Stateless images as well as how to manage cluster nodes and deployment software
  • Understand User management software
  • Understand monitoring option such as Prometheus, Telegraf, Grafana, Icinga
  • Understand the IMPI system

Subskills

skill-tree/adm/3/b.txt · Last modified: 2025/04/16 18:30 by 127.0.0.1