Lev Lafayette (Research Platform Services).
Many contemporary researchers are often confronted with significant computational problems. Often their datasets have grown beyond the capacity of their desktop systems to solve, or the complexity of their computational tasks are too great. This becomes even more challenging when one realises that both the datasets and the complexity of computation is growing faster than improvements in desktop systems. All of a sudden many researchers discover that not only do they have their own domain specialisations, but they also need to have an increasingly high level of familiarity with information science.
It is at this point that many researchers may have to turn to high performance computing. For many it’s not an easy transition; they may be used to a different operating system, and a very different user interface. Many come to the environment with little, if any, experience with the Linux operating system let alone the command-line interface and batch-job submission. They might be surprised that forwarding of graphics-intensive applications comes with major latency issues, if it is available at all, ‘data management’ is something meaningful, rather than just a buzzword (buzz-phrase?), and the version of the software being used and even the compiler used to install it is suddenly important.
Aspirin ion as produced with molecular dynamic simulation software NAMD, viewed locally with molecular modelling software VMD.
All of this generates a steep learning curve, but the good news is that it’s worth it. At this point of one’s research activities one is working in a very advanced environment and the challenges and results are commensurate. The use of the command-line is no mere fancy - operating at the level of the system shell means that one is very close to the bare metal, rather than abstracted by software and user-interface layers. As a result, performance is critical. Knowledge acquired of the shell environment is not knowledge that goes away either; whilst incremental features have been added to the original shell from 1977, it is still fundamentally the same beast, and it will remains so for decades to come - for the rest of one’s research career and beyond.
All of this comes together with the batch job submission system. A HPC cluster is essentially a large number of commodity servers linked together which acts as one system, even if partitioned according to hardware (or even ownership), and shared between many users. With many users competing to use this shared resourced some sort of queuing system is required - hence a scheduler which receives data from a resource manager and allocates where and when jobs can run. It is because of this capability (in terms of interconnect) and capacity (in terms of processor cores) that users can run their complex or large dataset tasks. How else is one going to run a complex computational problem that requires dozens of tasks to communicate with other without a message passing interface (MPI) application across multiple compute nodes? How about running the same processing task over dozens of datasets at the same time, as with a job array? Unless you have access to a HPC cluster, this simply can’t be done efficiently or effectively.
Of course, HPC is not the solution to all research or computational tasks. Long-running single-threaded applications whose datasets are dependent on each are not always a good fit. Nevertheless it is perhaps unsurprising to discover that both the availability of HPC systems and HPC training correlates with research output. It is almost as if by having powerful computing resources and the knowledge of how to use them means that the data can be analysed faster and the interpretation by researchers can be conducted earlier. It is something that many of the top universities around the world have realised, and the University of Melbourne has certainly come on board with this realisation with major upgrades to the Spartan HPC system this year and with the ‘Petascale Campus’ plans. Most of all, the Research Platforms team will continue to provide the best assistance we possibly can to help researchers get their work done efficiently and effectively.