skill-tree:k:1:2:b
Table of Contents
K1.2 Overview Hardware Architectures
HPC computer architectures are parallel computer architectures. A parallel computer is built out of
- Compute units.
- Main memory.
- A high-speed network.
Learning objectives
- elementary processing elements like CPUs, GPUs, many-core architectures
- vector systems, and FPGAs
- the NUMA architecture used for symmetric multiprocessing systems where the memory access time depends on the memory location relative to the processor
- network demands for HPC systems (e.g. high bandwidth and low latency)
- typical network architectures used for HPC systems, like fast Ethernet (1 or 10 Gbit) or InfiniBand
- Comprehend that in traditional CPUs - although CPU stands for Central Processing Unit - there is no central, i.e. single, processing unit any more because today all CPUs have multiple compute cores which all have the same functionality
- Comprehend that GPUs (Graphical Processing Units) or GPGPUs (General Purpose Graphical Processing Units) were originally used for image processing and displaying images on screens before people started to utilize the computing power of GPUs for other purposes
- Comprehend that FPGAs (Field-Programmable Gate Arrays) are devices that have configurable hardware and configurations are specified by hardware description languages
- Comprehend that FPGAs are interesting if one uses them to implement hardware features that are not available in CPUs or GPUs (e.g. low precision arithmetic that needs only a few bits)
- Comprehend that Vector units are successors of vector computers (i.e. the first generation of supercomputers) and that they are supposed to provide higher memory bandwidth than CPUs
- Comprehend that at an abstract level the high-speed network connects compute units and main memory which leads to three main parallel computer architectures
- Shared Memory where all compute units can directly access the whole main memory
- Distributed memory where individual computers are connected with a network
- NUMA (Non-Uniform Memory Access) combines properties from shared and distributed memory systems, because at the hardware level a NUMA system resembles a distributed memory
- Comprehend that in general, the effort for programming parallel applications for distributed systems is higher than for shared memory systems
- parallelization techniques at the instruction level of a processing element (e.g. pipelining, SIMD processing)
- advanced instruction sets that improve parallelization (e.g., AVX-512)
- hybrid approaches, e.g. combining CPUs with GPUs or FPGAs
- typical network topologies and architectures used for HPC systems, like fat trees based on switched fabrics using e.g. fast Ethernet (1 or 10 Gbit) or InfiniBand
- special or application-specific hardware (e.g. TPUs)
Subskills
skill-tree/k/1/2/b.txt · Last modified: 2024/09/11 12:30 by 127.0.0.1