-
Notifications
You must be signed in to change notification settings - Fork 19
Kokkos
Gregor Daiß edited this page Oct 9, 2020
·
14 revisions
- Adapt toolchain (use spack)
- Adapt Octotiger to compile with Kokkos (split main file, host-only blacklist, patch nvcc_wrapper, ...)
- Remodel GPU kernel buffer management (create memory pools for arbitrary host/gpu/kokkos data - avoid device malloc)
- Remodel GPU execution management (go from thread_local cuda_streams to executor pools - avoid stream creations)
- Adapt CPU/GPU launch inteface for the pools
- Remove Vc from headers
- Remove all thread_local workarounds in the gravity module
- Adapt current CUDA implementation to work with the memory and executor pools (keeps those working with the rest)
- Remove old CUDA management
- Create Kokkos Kernel for the Monopole Interactions
- Adapt Mikaels Kokkos executors and create unified interface for launching Kokkos Kernels on the CPU and GPU (~ 1 week including cleanup)
- Create Kokkos Kernel for the Multipole Interactions (~ 1-2 weeks)
- Evaluate Kokkos vs Cuda performance (Concern: Needless fencing in Kokkos) - In Progress
- Bonus: Test on AMD - In Progress
- Merge master into kokkos branch
- Refactor flux scalar single core
- Refactor flux to use explicit SIMD
- Make datastructure more GPU-friendly
- Create first GPU (CUDA?) kernel
- Integrate GPU kernel into existing launch infrastructure
- Switch datastructure over to the flux-way of doing things
- Refactor reconstruct to more easily use the available parallelism (at least 2 weeks given my experience trying to port the last reconstruct, better to plan for extra time)
- Create Basic reconstruct GPU kernel
- Interface reconstruct kernel with the existing GPU Infracstructure (see part 2)
- Evaluate need for optimizations