skill-tree:bda:2:b
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
skill-tree:bda:2:b [2020/06/18 20:15] – external edit 127.0.0.1 | skill-tree:bda:2:b [2025/03/10 19:24] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | # BDA2-B Big Data Tools in HPC | + | # BDA2 Overview |
- | # Background | + | |
- | There is a vast number of tools in different categories available that aid big data analytics -- but only some are production relevant in an HPC environment. | + | |
- | # Aim | + | Big Data Tools in High-Performance Computing (HPC) leverage the immense computational power and resources of HPC systems |
- | * To enable practitioners | + | |
- | # Outcomes | + | **Ophidia (BDA6.2):** Ophidia is a big data analytics framework specifically designed for HPC environments, |
- | * Distinguish the benefit and drawback of various big data tools in the HPC environment | + | |
- | * Apply a data science workflow on existing data using various big data tools | + | **Jupyter Notebooks (BDA6.3):** Jupyter Notebooks provide an interactive computing environment for data analysis, visualization, |
- | * Construct simple data science workflows | + | |
+ | **Cloud (BDA6.4):** Cloud computing platforms offer scalable and on-demand resources for big data analytics, complementing traditional HPC infrastructures. This section discusses the use of cloud services for big data processing, storage, and analysis, including platforms such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Topics also include hybrid cloud-HPC architectures, | ||
+ | |||
+ | **RayDP (BDA6.5):** RayDP is a distributed computing framework that combines the power of Apache Spark and Ray for scalable data processing and machine learning tasks. This section explores the integration of RayDP with HPC environments, | ||
+ | |||
+ | **Spark-Horovod (BDA6.6):** Spark-Horovod is an integration of Apache Spark with Horovod, a distributed deep learning framework. This section discusses the use of Spark-Horovod for large-scale deep learning tasks in HPC environments, | ||
+ | |||
+ | By leveraging big data tools in HPC environments, | ||
+ | |||
+ | ## Learning | ||
+ | * Distinguish the benefit and drawback of various big data tools in the HPC environment. | ||
+ | * Apply a data science workflow on existing data using various big data tools. | ||
+ | * Construct simple data science workflows. | ||
+ | |||
+ | ## Subskills | ||
- | # Subskills | ||
- | * [[skill-tree: |
skill-tree/bda/2/b.1592504111.txt.gz · Last modified: 2020/06/18 20:15 by 127.0.0.1