Offload Fortran Workloads to New Intel® GPUs Using OpenMP*

This robust implementation provides Fortran programmers access to many capabilities of Intel® Data Center GPUs right from their native language. The Intel® Fortran Compiler supports an OpenMP* v5.0 and 5.1 offload to GPUs and is already in use in released applications.

This demo showcases the offloading capabilities of Intel Fortran Compiler using GRILLIX/PARALLAX, a plasma turbulence and transport simulator. Port a small portion of this code, originally written for the CPU, to run on an Intel® Data Center GPU Max Series (formerly code named Ponte Vecchio) using OpenMP* offloading, and then compare the performance obtained across three devices:

  • Intel® Xeon® CPU Max Series (formerly code named Sapphire Rapids)
  • 3rd gen Intel® Xeon® Scalable processors
  • Intel Data Center GPU Max Series

The video covers:

  • The two leading advantages of Intel Fortran Compiler: Its performance and its support up to the 2018 Fortran standards—and, most importantly, its support for OpenMP, a vendor-agnostic, open-standard solution for portable and scalable shared-memory multiprocessing.
  • How the compiler can offload Fortran to GPUs. This demo focuses on porting GRILLIX/PARALLAX, written in Fortran 2008 for CPUs only. Having a robust compiler and performant code is essential for this real-life, complex application. GRILLIX and PARALLAX exploit hybrid OpenMP and MPI parallelism but so far have not explored the OpenMP offloading features.
  • A focus on the most expensive part of the application—the multigrid solver. This is a complex class of linear solvers composed of multiple independent components.
  • Porting of the Jacobi and red-black Gauss-Seidel smoothers from CPU to GPU, including the code and pragmas, which ultimately are vendor neutral and can be used with any compiler that possesses a back end for the target architecture.
  • Assessment of performance gains from this trivial code change, including a reduction to 0.6 seconds on Intel Data Center GPU Max Series versus roughly 4.3 seconds on Intel® Xeon® CPU Max Series.
  • Additional code fine-tuning to explore data locality, vectorization, and pipelining for the GPU solver. In the case of GRILLIX/PARALLAX, a simple one-time reordering in the way the data is stored in the GPU memory results in a further 40% performance gain in an Intel Data Center GPU Max Series.

 

Speaker

Rafael Lago, software technical consulting engineer, Intel

 

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Sign Up

@IntelDevTools

 

Download the Software

Get the latest stand-alone version of Intel Fortran Compiler or as part of the Intel® HPC Toolkit. The compiler:

  • Is production-ready for CPUs and GPUs.
  • Is based on the Intel Fortran Compiler Classic front-end and runtime libraries but uses LLVM* backend compiler technology.
  • Implements standard language features of Fortran 2018. Supports FORTRAN 77 to Fortran 2008, all main versions of Fortran standard, and many OpenMP 4.5, 5.0, and 5.1 directives and offloading features.
  • Provides Fortran programmers access to many capabilities of Intel Data Center GPU Max Series right from their native language.

Port Thermal Solver Code to Intel® Data Center GPU Max Series

Watch
 

Offload Fortran Workloads to New Intel® Data Center GPU Max Series

Watch