User Tools

Site Tools


skill-tree:use:2:b

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
skill-tree:use:2:b [2020/06/25 20:10] – [Outcomes] kai_hskill-tree:use:2:b [2025/03/10 19:24] (current) – external edit 127.0.0.1
Line 1: Line 1:
-# USE2-B Running of Parallel Programs +# USE2 Overview: Running of Parallel Programs
-# Background +
-Parallel computers are operated differently than a normal PC, all users must share the system. Therefore, various operative procedures are in place. Users must understand these concepts and procedures to be able to use the available resources of a system to run a parallel application. Moreover, individual solutions can often be found in a specific system.+
  
-Batch jobs submitted to job queue define the workloads in batch systemsA workload +Parallel computers are operated differently than normal PC, all users must share the system.  
-manager of a cluster system typically deals with +Thereforevarious operative procedures are in place.  
-  * Job Control to provide a user interface for submitting jobs to job queuesmonitoring their state during processing (e.g. to check their estimated starting time), and intervening in their execution (e.g. to abort them manually) +Users must understand these concepts and procedures to be able to use the available resources of a system to run a parallel application.  
-  * Scheduling and Resource Management to select a waiting job for execution and to allocate nodes to the job meeting all its other demands for computing resources (memory, special processing elements like GPUs, etc.) +Moreover, individual solutions can often be found in a specific system.
-  * Accounting to record historical data about how many computing resources (e.g. computing time) have been consumed by a job+
  
-Aim +## Learning Outcomes 
-  * To enable practitioners to comprehend the concepts and procedures for running parallel applications in HPC environments +  * Run parallel programs in an HPC environment. 
-  * To use a workload manager like SLURM or TORQUE to allocate HPC resources (e.g. CPUs) and to submit a batch job +  * Use the command-line interface. 
-  * To use the system to run and monitor the execution of parallel applications on the HPC system+  * Write robust job scripts, e.g. to simplify job submissions by the help of automated job chaining. 
 +  * Select the appropriate software environment. 
 +  * Use a workload manager like SLURM or TORQUE to allocate HPC resources (e.g. CPUs) and to submit a batch job. 
 +  * Consider cost aspects. 
 +  * Measure system performance as a basis for benchmarking a parallel program. 
 +  * Benchmark a parallel program. 
 +  * Tune a parallel program from the outside via runtime options. 
 +  * Apply the workflow for tuning. 
 +  * Use the command-line interface. 
 +  * Write robust job scripts, e.g. to simplify job submissions by the help of automated job chaining. 
 +  * Select the appropriate software environment. 
 +  * Use a workload manager to allocate HPC resources for running a parallel program interactively. 
 +  * Recognize cost aspects. 
 +  * Measure system performance as a basis for benchmarking a parallel program. 
 +  * Benchmark a parallel program. 
 +  * Tune a parallel program from the outside via runtime options. 
 +  * Apply the workflow for tuning.
  
- +## Subskills 
-Outcomes +  * [[skill-tree:k:4:1:b]] 
-  *  run parallel programs in an HPC environment  +  * [[skill-tree:use:2:i]]
-  *  use the command line interface  +
-  *  write robust job scripts, e.g. to simplify job submissions by the help of automated job chaining  +
-  *  select the appropriate software environment  +
-  *  use a workload manager like SLURM or TORQUE to allocate HPC resources (e.g. CPUs) and to submit a batch job +
-    * Job submission and cancellation (SLURM) +
-      * sbatch +
-      * salloc +
-      * srun +
-    * Monitoring job and system information (SLURM) +
-      * sinfo +
-      * squeue +
-      * sstat +
-      * scontrol +
-    * Retrieving accounting information +
-      * sacct +
-      * sacctmgr +
-  *  consider cost aspects  +
-  *  measure system performance as a basis for benchmarking a parallel program  +
-  *  benchmark a parallel program  +
-  *  tune a parallel program from the outside via runtime options  +
-  *  apply the workflow for tuning +
- +
-# Subskills +
-  * [[skill-tree:use:1:4:b]]+
   * [[skill-tree:use:1:1:b]]   * [[skill-tree:use:1:1:b]]
   * [[skill-tree:use:1:3:b]]   * [[skill-tree:use:1:3:b]]
 +  * [[skill-tree:use:1:4:b]]
   * [[skill-tree:pe:1:b]]   * [[skill-tree:pe:1:b]]
   * [[skill-tree:pe:2:b]]   * [[skill-tree:pe:2:b]]
   * [[skill-tree:pe:3:b]]   * [[skill-tree:pe:3:b]]
   * [[skill-tree:k:4:b]]   * [[skill-tree:k:4:b]]
 +
skill-tree/use/2/b.1593108655.txt.gz · Last modified: 2020/06/25 20:10 by kai_h