With HPC, the primary goal is to crunch numbers, not to sort data. It
demands specialized program optimizations to get the most from a system
in terms of input/output, computation, and data movement. And the
machines all have to trust each other because they’re shipping
information back and forth.
HPC systems are based on parallel computing, which uses concurrency to
decrease program runtimes. If the program lends itself to
parallelization, the more nodes you can throw at it, the better. It’s
like painting a fence; if you can get some friends to paint with you,
the job gets done faster. But some applications must follow a serial
process, like building a house. You may have an application that can
only run on one CPU, with no shortcutting, no way to split it up. In
that case, you could just get a bigger, faster box.
In the aerospace industry you’ve got a lot of calculations to do before
you can call an airplane design good and ship it. Or if you’re in the
oil industry, there may be a patch of ground somewhere that looks
promising, but the geological information requires more processing. So
you take the information from the field back and throw it at the
cluster computer, and run it through some powerful software to figure
out if the land has oil. First one to figure if it does, gets the bid
in and wins. The other guy loses.
n HPC clusters, all of the machines are working on a specific numeric
problem that needs a significant amount of calculation to find the
answer. For example, although it’s really fast, a state DMV driver
license database lookup is not HPC, it’s more of a table lookup, not a
calculation. But if you’re building airplanes and calculating the
airflow over a wing of a 747, the more speed the better. You have to
break up the wing sections into one inch pieces, then calculate the
flow over each sq. inch, etc. then repeat the process. The faster the
cluster, the faster you’re going to get your answer.
There are three rules when building HPC clustering. Rule number one is,
the application defines how the cluster is built: everything underneath
it is dependent on that application. Just because a processor is faster
doesn’t mean it’s going to be faster on that application.
Rule two is the balance between computing and communication. A
cluster’s individual members compute like crazy, then at some point
they communicate. Then it repeats; it’s another difference in how you
design your hardware. With a fixed amount of money to work from, you
have to split it between hardware for communication, and the hardware
for computation or nodes, infrastructure, etc.
The third rule of clustering is “There will be an upgrade”! Clusters
are not usually static, they can grow machine by machine as the
computing needs of the organization change