Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/openmp #47

Merged
merged 119 commits into from
Sep 3, 2024
Merged

Feature/openmp #47

merged 119 commits into from
Sep 3, 2024

Conversation

fluidnumerics-joe
Copy link
Member

This PR brings in GPU support via OpenMP. This removes the required dependency on HIP/HIPFort. HIPFort has proven challenging to set up with a variety of Fortran compilers which increases the risk of users entering "dependency hell".

OpenMP offloading allows users to simply build CPU only versions quite easily. If GPU offloading (or multi-core CPU offloading!!) is desired, user's will need to use a compiler that supports OpenMP 4.5 for GPU offloading, these include

  • ROCm amdflang
  • LLVM flang
  • Nvidia-HPC nvfortran
  • Intel OneAPI (ifx)
  • Cray xlf

A complete list of compilers can be found here https://www.openmp.org/resources/openmp-compilers-tools/#compilers

This implementation also cuts out a significant amount of code in comparison to the HIPFort version (with or without managed memory). This makes it more tractable, IMO, to focus on adding in additional pre-canned models alongside CLI and options configuration within this repository.

Another side benefit, users who want to build there own models can simply concretize flux, riemann solver, and source terms as type bound procedures of extensions of the various model base types (written strictly in fortran), without having to worry about host/device memory management. OpenMP is fairly straight forward to offload those new models to GPUs if desired.

To hook in BLAS on GPUs, we can use the omp use device pointer directive to integrate with other packages like hipblas or magma. This would likely come in a future PR that focuses on performance.

Main outcomes

  • Reduce dependencies
  • Simplify install process for users
  • Simplify GPU memory and kernel management
  • Reduce LOC

The managed memory interfaces are quite a few lines of code. I'd
rather have that sit in an external dependency. We need to have a more
robust build setup in Cmake to detect the correct includes and libs for
hipfort, but for now, we are ok on noether
There is a pattern we can exploit for derivative, gridinterp, and
boundaryinterp in 1-D
We're now using the hipblas implementation of the tensor divergence
Next, we need to sort out the boundary terms for dg-divergence of a
tensor so that we can do dg gradients
Fixes here means fixes to blatantly wrong openmp directives (e.g.
missing `omp end target` or misspelled variables in map clauses)
Multicore and GPU offloading currently fails

GPU offloading fails during build-time with amdflang for gfx90a

Multicore offloading fails at runtime in the first target region of
every test with a segmentation fault when using amdflang

Multicore offloading fails with gnu compilers (13.2.0) during build
time
We can now build executables for nvidia gpus. Still need to do tests :)
@fluidnumerics-joe
Copy link
Member Author

Currently waiting on spack/spack#44737 to merge for feq-parse/2.2.2

Change derivative and gridinterp methods to pure functions. This is
meant to improve performance
Set gradient and gridinterp operations to pure functions
Set gradient, divergence, and gridinterp operations to pure functions
Set gradient and gridinterp operations to pure functions
Add curl invariant form of the contravariant basis vectors for 3-D
NVFORTRAN-S-0155 - PURE subprograms may not contain OpenMP directives
loop construct allows us to not worry about the different directives
required for multicore and gpu architectures.

See https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#openmp-loop
Build test is only done here. Run tests will be done via superci on
Fluid Numerics armory systems
valgrind check is currently failing within libevent, installed by spack,
while all other tests are fine.
@garrettbyrd garrettbyrd merged commit ea1e960 into main Sep 3, 2024
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants