From CPU to GPU in 80 Days
The Palabos GPU port: from CPU to GPU in 80 days
In 2021, Palabos received a GPU backend allowing to run existing Palabos programs in a multi-GPU environment with a single-line change. The specifics of the GPU port are summarized here.
|Repository of the GPU port:
|C++ parallel algorithms
|Type of GPUs:
|So far tested on NVIDIA GPUs only (but the formalism is vendor independent)
Palabos performance on GPU
We report here the performance obtained with the 3D cavity flow benchmark on a single A100 (40 GB) GPU and on a node with 4 A100 (40 GB) GPUs with NVLINK interconnect.
Single-GPU performance (A100 40 GB)
On a single A100 GPU, Palabos achieves 3610 MLUPS with double-precision floats and 7050 MLUPS with single-precision floats.
Performance on 4 A100 40 GB GPUs
The multi-GPU performance is tested in a strong scaling regime (overall problem size remains constant): the code runs more than 3 times faster than on a single GPU. Compared to a 48-core Xeon Gold CPU (executed with the original CPU-parallel version of Palabos), the 4-GPU code runs 55 times faster.
Test cases executed on GPU so far