Workshop Recap: Introduction to High-Performance Computing (HPC)

On May 6 2024, Dr. Loris Bennett from FUB-IT at Freie Universität Berlin held the workshop Introduction to High-Performance Computing (HPC) at WI. In this workshop, he gave an overview of the mechanics of HPC and enabled participants to try it out themselves. While the workshop used the HPC cluster provided by FUB-IT as a practical example, most of the contents applied to HPC in general.

Dr. Bennett began with definitions of HPC and core concepts. He described HPC as a cluster of servers providing cores, memory, storage with high-speed interconnections. These resources are shared between users and distributed by the system itself. Users send jobs consisting of one or more tasks to the HPC cluster. Each task will run on a single compute server, also called a node, and can make use of multiple cores up to the maximum available on a node. The number of tasks per node can be set for each job, but defaults to one. Lastly, an HPC cluster may provide different file systems for different purposes. For example, the file system /home is optimized for large numbers of small files used for programs, scripts, and results, while /scratch is optimized for temporary storage of small numbers of large files.

Next, Dr. Bennett proceeded with resource management. When launching a job, many parameters can be set, such as the number of CPU and GPU cores, the amount of memory, and the time used. In order to determine the resources required for jobs, users need to run a few jobs and check what was actually used. This information can then be used to set the requirements for future jobs and thus ensure that the resources are used efficiently. The priority of a job dictates when a job is likely to start and depends mainly on the amount of resources consumed by the user in the last month. A Quality of Service (QoS) can be set per job which will increase the priority of a job, but the jobs within a given QoS will be restricted in the total amount of resources they can use. In addition, it is possible to parallelize tasks by splitting them into subtasks that can be performed simultaneously. Likewise, many similar jobs can be planned efficiently using job arrays.

Finally, participants could log into the FUB-IT HPC cluster themselves either using the command line or graphical interface tools and request first sample jobs. They were shown how to write batch files defining job parameters, use commands to submit, show, or cancel jobs, and check the results and efficiency of a completed job.

The Methods Lab would like to thank Dr. Bennett for his concise but comprehensive introduction to HPC!