User Tools

Site Tools


skill-tree:k:4:1:b

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
skill-tree:k:4:1:b [2020/07/14 00:40] lucianaskill-tree:k:4:1:b [2025/04/16 18:30] (current) – external edit 127.0.0.1
Line 1: Line 1:
-# K4.1-B Introduction to Job Scheduling +# K4.1 Basic principles of Job Scheduling 
-# Background+
 This skill provides an overview of the scheduling of jobs on a supercomputer. This skill provides an overview of the scheduling of jobs on a supercomputer.
 It covers generic and widely used concepts that serve the purpose to maximize the efficiency of a supercomputer. It covers generic and widely used concepts that serve the purpose to maximize the efficiency of a supercomputer.
  
-Batch jobs submitted to a job queue define the workloads in batch systems. A workload +Batch jobs submitted to a job queue define the workloads in batch systems. 
-manager of a cluster system typically deals with +A workload manager of a cluster system typically deals with: 
-  * Job Control to provide a user interface for submitting jobs to job queues, monitoring their state during processing (e.g. to check their estimated starting time), and intervening in their execution (e.g. to abort them manually) +    * Job Control to provide a user interface for submitting jobs to job queues, monitoring their state during processing (e.g. to check their estimated starting time), and intervening in their execution (e.g. to abort them manually) 
-  * Scheduling and Resource Management to select a waiting job for execution and to allocate nodes to the job meeting all its other demands for computing resources (memory, special processing elements like GPUs, etc.) +    * Scheduling and Resource Management to select a waiting job for execution and to allocate nodes to the job meeting all its other demands for computing resources (memory, special processing elements like GPUs, etc.) 
-  * Accounting to record historical data about how many computing resources (e.g. computing time) have been consumed by a job +    * Accounting to record historical data about how many computing resources (e.g. computing time) have been consumed by a job
  
 +## Learning Outcomes
  
-# Aim +* Comprehend the exclusive and shared usage model in HPC. 
-To enable practitioners to comprehend and describe the basic architecture and concepts of resource allocation for an HPC system+* Differentiate batch and interactive job submission. 
 +* Comprehend the generic concepts and architecture of resource manager, scheduler, job and job script. 
 +* Explain environment variables as a means to communicate. 
 +* Comprehend accounting principles. 
 +* Explain the generic steps to run and monitor a single job. 
 +* Comprehend scheduling principles (first come first served, shortest job first, backfilling) to achieve objectives like minimizing the averaged elapsed program runtimes, and maximizing the utilization of the available HPC resources. 
 +* Comprehend the differences between **Batch Systems** and **Time-Sharing Systems**. 
 +* Explain the concepts and procedures for resource allocation and job execution in an HPC environment. 
 +* Run interactive jobs and batch jobs. 
 +* Comprehend and describe the expected behavior of job scripts. 
 +* Change provided job scripts and embed them into shell scripts to run a variety of parallel applications. 
 +* Analyze the output generated from a job scheduler and describe the cause of typically generated errors. 
 +* Comprehend accounting principles (billing for the jobs). 
 +* Comprehend the set of terms for performance criteria like: 
 +    * Resource Utilization. 
 +    * Throughput. 
 +    * Waiting Time. 
 +    * Execution Time. 
 +    * Turnaround Time. 
 +* Comprehend scheduling strategies that increase productivity. 
 +* Comprehend that typical goals of job scheduling are: 
 +    * Maximization of resource utilization. 
 +    * Maximization of throughput. 
 +    * Minimization of waiting time. 
 +    * Minimization of turnaround time. 
 +* Comprehend that there is a variety of scheduling algorithms from rather simple to more complex like: 
 +    * First-Come-First-Served (FCFS). 
 +    * Shortest-Job-First (SJF). 
 +    * Priority-based. 
 +    * Fair-Share. 
 +    * Backfilling. 
 +* Apply advanced scheduling principles (e.g. backfilling) to achieve objectives like minimizing the averaged elapsed program runtimes, and maximizing the utilization of the available HPC resources. 
 +* Discuss sophisticated scheduling principles (e.g. fair share) to achieve objectives like treating the users fair, and maximizing the utilization of the available HPC resources.
  
-# Outcomes 
-  * Comprehend the exclusive and shared usage model in HPC 
-  * Differentiate batch and interactive job submission 
-  * Comprehend the generic concepts and architecture of resource manager, scheduler, job and job script 
-  * Explain environment variables as a means to communicate 
-  * Comprehend accounting principles 
-  * Explain the generic steps to run and monitor a single job 
-  * Comprehend scheduling principles (first come first served, shortest job first, backfilling) to achieve objectives like minimizing the averaged elapsed program runtimes, and maximizing the utilization of the available HPC resources 
  
-# Subskills 
  
skill-tree/k/4/1/b.1594680018.txt.gz · Last modified: 2020/07/14 00:40 by luciana