Introduction to High Performance Computing#

What is a cluster?#

A compute cluster is a group of uniformly configured compute nodes, that are connected over a network. Generally, the workloads running on these systems require more computational power, or memory than what can be found on a single workstation. These nodes, once clustered together - work as a single system.

These clusters can have an arbitrary number of compute nodes, and a few "master" nodes to process logins, and coordinate the deployment of programs and data, to jobs.

Schedulers#

Given that groups of users will want to use the cluster at the same time, we use a special software tool called the scheduler to distribute the computational resources available fairly. For example, cryptographers may need primarily CPU computational power, while deep learning researchers may need the CPU power, as well as GPU acceleration. Schedulers work along with a queueing system, that distributed jobs between the nodes in the cluster as resources become available.

Besides, writing code from scratch that runs across nodes, requires a lot of overhead effort. A scheduler like PBS allows inter-node communication, efficiently and correctly, such that your programs run correctly.

There schedulers can often be optimized and configured in very complex patterns, such as the provision for high-priority queues, monitoring user activity etc.

  1. Who can use or should use a HPC cluster
  2. When should I use HPC
  3. When should I not use HPC