# K4.1 Basic principles of Job Scheduling This skill provides an overview of the scheduling of jobs on a supercomputer. It covers generic and widely used concepts that serve the purpose to maximize the efficiency of a supercomputer. Batch jobs submitted to a job queue define the workloads in batch systems. A workload manager of a cluster system typically deals with: * Job Control to provide a user interface for submitting jobs to job queues, monitoring their state during processing (e.g. to check their estimated starting time), and intervening in their execution (e.g. to abort them manually) * Scheduling and Resource Management to select a waiting job for execution and to allocate nodes to the job meeting all its other demands for computing resources (memory, special processing elements like GPUs, etc.) * Accounting to record historical data about how many computing resources (e.g. computing time) have been consumed by a job ## Learning Outcomes * Comprehend the exclusive and shared usage model in HPC. * Differentiate batch and interactive job submission. * Comprehend the generic concepts and architecture of resource manager, scheduler, job and job script. * Explain environment variables as a means to communicate. * Comprehend accounting principles. * Explain the generic steps to run and monitor a single job. * Comprehend scheduling principles (first come first served, shortest job first, backfilling) to achieve objectives like minimizing the averaged elapsed program runtimes, and maximizing the utilization of the available HPC resources. * Comprehend the differences between **Batch Systems** and **Time-Sharing Systems**. * Explain the concepts and procedures for resource allocation and job execution in an HPC environment. * Run interactive jobs and batch jobs. * Comprehend and describe the expected behavior of job scripts. * Change provided job scripts and embed them into shell scripts to run a variety of parallel applications. * Analyze the output generated from a job scheduler and describe the cause of typically generated errors. * Comprehend accounting principles (billing for the jobs). * Comprehend the set of terms for performance criteria like: * Resource Utilization. * Throughput. * Waiting Time. * Execution Time. * Turnaround Time. * Comprehend scheduling strategies that increase productivity. * Comprehend that typical goals of job scheduling are: * Maximization of resource utilization. * Maximization of throughput. * Minimization of waiting time. * Minimization of turnaround time. * Comprehend that there is a variety of scheduling algorithms from rather simple to more complex like: * First-Come-First-Served (FCFS). * Shortest-Job-First (SJF). * Priority-based. * Fair-Share. * Backfilling. * Apply advanced scheduling principles (e.g. backfilling) to achieve objectives like minimizing the averaged elapsed program runtimes, and maximizing the utilization of the available HPC resources. * Discuss sophisticated scheduling principles (e.g. fair share) to achieve objectives like treating the users fair, and maximizing the utilization of the available HPC resources.