Modern-CPU-GPU-programming

Welcome! This is a wiki page and code base to support my final project artifacts towards a MS in Computer Science at the University of Tennessee at Chattanooga. These codes were developed using C++ and the Kokkos Ecosystem for the purpose of achieving high performance in a hardware agnostic way. This is achieved by expressing the code that you want to execute in parallel with Kokkos parallel abstractions. Writing code with these abstractions enables parallel execution on the CPU and/or the GPU in heterogenous manycore architectures. The execution target (e.g., CPU/GPU) is set at compile time via one or multiple of these parameters, along with other optimizations, depending on the architecture.

Brief Background

Heterogeneous parallel programming is essential for Exascale and other high-performance systems, given the realities of modern architectures. Kokkos is a C++ Performance Portability Framework that provides a more unified approach to writing HPC applications. As modern memory architectures continue to become more and more diverse, we can use portability Frameworks like Kokkos to write high performance computing applications in a way such that the applications can acheive both performance and portability by compiling and optimizing for the hardware. Without Kokkos, one would normally have to rewrite applications anytime they wanted to run their code on another cluster or system with a different programming model/hardware architecture. Instead, we can write code in a way that can achieve performance across theoretically any HPC platform without the need to refactor the code. This saves alot of time, as the average HPC application is 300,000-600,000 lines of code. Using Kokkos also makes optimizing the memory access patterns between diverse devices like CPUs and GPUs easier, since the optimizations can be set at compile time.

Getting started

All you need is a C++ Compiler and Cmake (but its more fun if you have OpenMP and Cuda too). At the time of writing this, I was using:

gcc/10.2.0 (with OpenMP 4.5)
cmake/3.19.4
cuda/11.3

The code was executed on a compute cluster node with 80 logical cores and four NVIDIA GPUs.

Build Instructions

1). Start by cloning the Kokkos Repository. I like doing this in a folder like ~/installs, but if you want to be extra safe, clonde directly to $HOME via

cd ###
git clone https://github.com/kokkos/kokkos.git

Now we need to build the library. Do this via:

For Building the Serial Backend:

mkdir build && cd build 
cmake .. -DCMAKE_INSTALL_PREFIX=<path-to-where-you-want-to-install-kokkos> 
         -DCMAKE_CXX_COMPILER=<path-to-your-g++>

For building with OpenMP Enabled

cmake .. -DCMAKE_INSTALL_PREFIX=<path-to-where-you-want-to-install-kokkos>
         -DCMAKE_CXX_COMPILER=<path-to-your-g++>
         -DKokkos_ENABLE_OPENMP=ON

For building with CUDA Enabled

cmake .. -DCMAKE_INSTALL_PREFIX=<path-to-where-you-want-to-install-kokkos> 
         -DCMAKE_CXX_COMPILER=kokkos/bin/nvcc_wrapper 
         -DKokkos_ENABLE_CUDA=ON

Recommended build with advanced optimizations: Here I'm optimizing for NVIDIA VOLTA

cmake .. -DCMAKE_INSTALL_PREFIX=<path-to-where-you-want-to-install-kokkos>
         -DKokkos_ENABLE_CUDA_LAMBDA=ON 
         -DKokkos_ENABLE_CUDA=ON
         -DKokkos_ENABLE_CUDAUVM=ON
         -DKokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE=ON
         -DKokkos_ARCH_VOLTA70=ON 
         -DKokkos_ENABLE_CUDA_LAMBDA=ON

When this finishes, run
```
       make install
```

The library is now built, we are almost done! Now, cd to a folder where you want your source code and clone my repo!

      cd 
      cd your_experiments/ 
      git clone https://github.com/tommygorham/modern-cpu-gpu-programming.git
      cd modern-cpu-gpu-programming

Now you can build my programs and run them by cd'ing into PROGRAM<#> and running cmake ../ in the build folder. For example
```
      cd PROGRAM1/build 
      !cmake 
      make 
```

Note: !cmake ensures you build your program with the same Cmake arguments that you built the Kokkos library with.

Run the Exe the CMakeLists.txt to make to build.
```
     ./<exename>    
```

Optional Run-time args

      export OMP_NUM_THREADS=<#> 
      EXPORT OMP_PROC-BIND=spread
      export OMP_PLACES=threads
      ./<exename> --kokkos-num-devices=4 (if you have 4 GPUs)
      ./<exename> --kokkos-numa=2   (if you have 2 NUMA regions)

Additionally, you can view my wiki for more detailed information

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Performance-Results		Performance-Results
Program1		Program1
Program2		Program2
Program3		Program3
Program4		Program4
RAJA		RAJA
useful_util		useful_util
wiki-images		wiki-images
.gitignore		.gitignore
Gorham5900PresentationtSlides.pptx		Gorham5900PresentationtSlides.pptx
Gorham_Thomas_PPHP_May2022.pdf		Gorham_Thomas_PPHP_May2022.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern-CPU-GPU-programming

Brief Background

Getting started

Build Instructions

For Building the Serial Backend:

For building with OpenMP Enabled

For building with CUDA Enabled

Recommended build with advanced optimizations: Here I'm optimizing for NVIDIA VOLTA

About

Releases

Packages

Languages

License

tommygorham/portable-cpu-gpu

Folders and files

Latest commit

History

Repository files navigation

Modern-CPU-GPU-programming

Brief Background

Getting started

Build Instructions

For Building the Serial Backend:

For building with OpenMP Enabled

For building with CUDA Enabled

Recommended build with advanced optimizations: Here I'm optimizing for NVIDIA VOLTA

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages