# K1.2 Overview Hardware Architectures HPC computer architectures are parallel computer architectures. A parallel computer is built out of * Compute units. * Main memory. * A high-speed network. ## Learning objectives * elementary processing elements like CPUs, GPUs, many-core architectures * vector systems, and FPGAs * the NUMA architecture used for symmetric multiprocessing systems where the memory access time depends on the memory location relative to the processor * network demands for HPC systems (e.g. high bandwidth and low latency) * typical network architectures used for HPC systems, like fast Ethernet (1 or 10 Gbit) or InfiniBand * Comprehend that in traditional **CPUs** - although CPU stands for Central Processing Unit - there is no central, i.e. single, processing unit any more because today all CPUs have multiple compute cores which all have the same functionality * Comprehend that **GPUs** (Graphical Processing Units) or **GPGPUs** (General Purpose Graphical Processing Units) were originally used for image processing and displaying images on screens before people started to utilize the computing power of GPUs for other purposes * Comprehend that **FPGAs** (Field-Programmable Gate Arrays) are devices that have configurable hardware and configurations are specified by hardware description languages * Comprehend that **FPGAs** are interesting if one uses them to implement hardware features that are not available in CPUs or GPUs (e.g. low precision arithmetic that needs only a few bits) * Comprehend that **Vector units** are successors of vector computers (i.e. the first generation of supercomputers) and that they are supposed to provide higher memory bandwidth than CPUs * Comprehend that at an abstract level the high-speed network connects compute units and main memory which leads to three main parallel computer architectures * **Shared Memory** where all compute units can directly access the whole main memory * **Distributed memory** where individual computers are connected with a network * **NUMA** (Non-Uniform Memory Access) combines properties from shared and distributed memory systems, because at the hardware level a NUMA system resembles a distributed memory * Comprehend that in general, the effort for programming parallel applications for distributed systems is higher than for shared memory systems * parallelization techniques at the instruction level of a processing element (e.g. pipelining, SIMD processing) * advanced instruction sets that improve parallelization (e.g., AVX-512) * hybrid approaches, e.g. combining CPUs with GPUs or FPGAs * typical network topologies and architectures used for HPC systems, like fat trees based on switched fabrics using e.g. fast Ethernet (1 or 10 Gbit) or InfiniBand * special or application-specific hardware (e.g. TPUs) ## Subskills