skill-tree:adm:3:b
Table of Contents
ADM3 Cluster Management
Learn about managing an HPC cluster and what software can be used for aspects such as resource management, system provisioning, and monitoring.
Requirements
Learning Outcomes
- Understand management tools such as xCAT and Clustore
- Understand workload manager such as SLURM, SGE, or HTConda
- Understand parallel shell programs such as Clustershell
- Understand Stateful and Stateless images as well as how to manage cluster nodes and deployment software
- Understand User management software
- Understand monitoring option such as Prometheus, Telegraf, Grafana, Icinga
- Understand the IMPI system
Subskills
skill-tree/adm/3/b.txt · Last modified: 2025/04/16 18:30 by 127.0.0.1