User Tools

Site Tools


skill-tree:k:4:1:b

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
skill-tree:k:4:1:b [2020/06/25 20:18] – [Background] kai_hskill-tree:k:4:1:b [2025/04/16 18:30] (current) – external edit 127.0.0.1
Line 1: Line 1:
-# K4.1-B Introduction to job scheduling +# K4.1 Basic principles of Job Scheduling
-# Background +
-This skill provides an overview into the scheduling of jobs on a supercomputer. +
-It covers generic and widely used concepts that serve the purpose to maximize the efficiency of a supercomputer.+
  
-Batch jobs submitted to a job queue define the workloads in batch systems. A workload +This skill provides an overview of the scheduling of jobs on supercomputer
-manager of a cluster system typically deals with +It covers generic and widely used concepts that serve the purpose to maximize the efficiency of supercomputer.
-  * Job Control to provide a user interface for submitting jobs to job queues, monitoring their state during processing (e.g. to check their estimated starting time), and intervening in their execution (e.g. to abort them manually) +
-  * Scheduling and Resource Management to select waiting job for execution and to allocate nodes to the job meeting all its other demands for computing resources (memory, special processing elements like GPUs, etc.+
-  * Accounting to record historical data about how many computing resources (e.g. computing time) have been consumed by a job+
  
 +Batch jobs submitted to a job queue define the workloads in batch systems.
 +A workload manager of a cluster system typically deals with:
 +    * Job Control to provide a user interface for submitting jobs to job queues, monitoring their state during processing (e.g. to check their estimated starting time), and intervening in their execution (e.g. to abort them manually)
 +    * Scheduling and Resource Management to select a waiting job for execution and to allocate nodes to the job meeting all its other demands for computing resources (memory, special processing elements like GPUs, etc.)
 +    * Accounting to record historical data about how many computing resources (e.g. computing time) have been consumed by a job
  
 +## Learning Outcomes
  
-# Aim +* Comprehend the exclusive and shared usage model in HPC. 
-To enable practitioners to comprehend and describe the basic architecture and concepts of resource allocation for an HPC system+* Differentiate batch and interactive job submission. 
 +* Comprehend the generic concepts and architecture of resource manager, scheduler, job and job script. 
 +* Explain environment variables as a means to communicate. 
 +* Comprehend accounting principles. 
 +* Explain the generic steps to run and monitor a single job. 
 +* Comprehend scheduling principles (first come first served, shortest job first, backfilling) to achieve objectives like minimizing the averaged elapsed program runtimes, and maximizing the utilization of the available HPC resources. 
 +* Comprehend the differences between **Batch Systems** and **Time-Sharing Systems**. 
 +* Explain the concepts and procedures for resource allocation and job execution in an HPC environment. 
 +* Run interactive jobs and batch jobs. 
 +* Comprehend and describe the expected behavior of job scripts. 
 +* Change provided job scripts and embed them into shell scripts to run a variety of parallel applications. 
 +* Analyze the output generated from a job scheduler and describe the cause of typically generated errors. 
 +* Comprehend accounting principles (billing for the jobs). 
 +* Comprehend the set of terms for performance criteria like: 
 +    * Resource Utilization. 
 +    * Throughput. 
 +    * Waiting Time. 
 +    * Execution Time. 
 +    * Turnaround Time. 
 +* Comprehend scheduling strategies that increase productivity. 
 +* Comprehend that typical goals of job scheduling are: 
 +    * Maximization of resource utilization. 
 +    * Maximization of throughput. 
 +    * Minimization of waiting time. 
 +    * Minimization of turnaround time. 
 +* Comprehend that there is a variety of scheduling algorithms from rather simple to more complex like: 
 +    * First-Come-First-Served (FCFS). 
 +    * Shortest-Job-First (SJF). 
 +    * Priority-based. 
 +    * Fair-Share. 
 +    * Backfilling. 
 +* Apply advanced scheduling principles (e.g. backfilling) to achieve objectives like minimizing the averaged elapsed program runtimes, and maximizing the utilization of the available HPC resources. 
 +* Discuss sophisticated scheduling principles (e.g. fair share) to achieve objectives like treating the users fair, and maximizing the utilization of the available HPC resources.
  
-# Outcomes 
-  * comprehend the exclusive and shared usage model in HPC 
-  * differentiate batch and interactive job submission 
-  * comprehend the generic concepts and architecture of resource manager, scheduler, job and job script 
-  * explain environment variables as a means to communicate 
-  * comprehend accounting principles 
-  * explain the generic steps to run and monitor a single job 
-  * comprehend scheduling principles (first come first served, shortest job first, backfilling) to achieve objectives like minimizing the averaged elapsed program runtimes, and maximizing the utilization of the available HPC resources 
  
-# Subskills 
  
skill-tree/k/4/1/b.1593109121.txt.gz · Last modified: 2020/06/25 20:18 by kai_h