Developing a unique FPGA testbed for UK researchers
28 October 2021
EPCC is hosting a new FPGA testbed and running it in collaboration with UCL and the University of Warwick. Funded under the ExCALIBUR H&ES programme, the system will be available for developers to experiment with novel hardware for their workloads to enable them to understand the role that these technologies might play in future exascale machines.
The testbed will act as a “one-stop-shop” where developers can run their code on FPGAs, with all the tooling and necessary licences already installed and available to lower the barrier to entry as much as possible. In the few months since my previous EPCC News article, we have installed our first set of FPGAs into the testbed and made them available to users.
The FPGA testbed will be a first step towards building a future community and ecosystem around the role of FPGAs in HPC, data science, AI, and machine learning workloads in the UK. It will be made publicly available and will form a unique resource within UK academic computing.
There have been numerous exciting activities undertaken on the testbed, for example accelerating the UK Met Office’s MONC atmospheric model across both Xilinx and Intel FPGAs. The Met Office is one of the major use cases of ExCALIBUR, and so it was natural to select its high resolution atmospheric MONC model as a test case. An advection kernel of this model was selected and ported to both Xilinx Alveo U280 and Intel Stratix-10 FPGAs. This illustrates a major benefit of the testbed, where not only are users able to explore the algorithmic transformations necessary for their code to run effectively on FPGAs, but also able to compare the technologies of different vendors to understand the consequences of different choices.
Interestingly the different vendor offerings seemed to be suited for different aspects of the workload in this instance. At a single kernel level, the Intel toolchain seemed able to perform more automatic optimisation and thus performed best. However, the disadvantage of this was that the programmer had less explicit control, which then became more important as the number of kernels was scaled up, with the Xilinx toolchain ultimately being able to accommodate more kernels at a higher clock frequency across the chip.
The optimised FPGA versions were then compared against a 24-core Xeon Platinum Cascade Lake CPU, and V100 GPU. Both the FPGAs and GPU significantly out-performed the CPU, however the GPU was far more challenging to beat, ultimately winning out on performance. However, when considering power draw and power efficiency, which is another important dimension for exascale computing, the significantly reduced power draw of the FPGAs, especially the Xilinx Alveos, resulted in significant wins here.
This is just one early success story of the testbed. In the short time that it has been available users have explored numerous HPC simulation codes and optimised existing FPGA libraries. We are now moving into the second phase of the testbed, where we will install Xilinx’s Versal FPGAs and look to further increase the user base by running training courses.
More information about the testbed can be found at
https://fpga.epcc.ed.ac.uk
ExCALIBUR programme
https://excalibur.ac.uk