Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite @threaded macro to work with GPUs #534

Merged
merged 25 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
301059e
Fix wrapping for GPU arrays
efaulhaber May 22, 2024
917e0e5
Adapt `@threaded` macro to work with GPUs
efaulhaber May 22, 2024
19b7806
Use new `@threaded` macro
efaulhaber May 22, 2024
9644021
Fix NHS to run GPU code with CPU arrays
efaulhaber May 22, 2024
1ff91d3
Import `@index` from KernelAbstractions.jl
efaulhaber May 23, 2024
7514ac5
Add proper docs for `@threaded`
efaulhaber Jun 12, 2024
eb4903b
Fix `foreach_neighbor`
efaulhaber Jun 12, 2024
bafccab
Reformat
efaulhaber Jun 12, 2024
e7b6b6c
Add test for GPU code on the CPU
efaulhaber Jun 12, 2024
f10246f
Fix unit tests
efaulhaber Jun 12, 2024
c6103d9
Add unit tests for GPU interaction
efaulhaber Jun 12, 2024
b116731
Also add tests for GPU versions of `wrap_v` and `wrap_u`
efaulhaber Jun 12, 2024
6c14177
Add kwarg `neighborhood_search` to dam break example
efaulhaber Jun 12, 2024
4dab0bd
Fix tests
efaulhaber Jun 12, 2024
3a4a167
Merge branch 'threaded-macro-gpu' of github.com:efaulhaber/TrixiParti…
efaulhaber Jun 12, 2024
a4d8aef
Reformat
efaulhaber Jun 12, 2024
8e8efa8
Add function `wrap_array`
efaulhaber Jun 19, 2024
6f6e02d
Merge branch 'main' into threaded-macro-gpu
efaulhaber Jun 19, 2024
969cad2
Fix merge
efaulhaber Jun 20, 2024
0870d57
Merge branch 'main' into threaded-macro-gpu
efaulhaber Jun 20, 2024
b05f045
Fix open boundary code
efaulhaber Jun 20, 2024
613ec7e
Implement suggestions
efaulhaber Jun 20, 2024
9795536
Fix syntax error from merge
efaulhaber Jun 20, 2024
9313efc
Fix `wrap_u` and `wrap_v` for GPU array types
efaulhaber Jun 20, 2024
2140a67
Fix constructor of `BoundarySPHSystem`
efaulhaber Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DiffEqCallbacks = "459566f4-90b8-5000-8ac3-15dfb0a30def"
FastPow = "c0e83750-1142-43a8-81cf-6c956b72b4d1"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
GPUArrays = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MuladdMacro = "46d2c3a1-f734-5fdb-9937-b9b9aeba4221"
PointNeighbors = "1c4d5385-0a27-49de-8e2c-43b175c8985c"
Expand All @@ -35,6 +37,7 @@ FastPow = "0.1"
ForwardDiff = "0.10"
JSON = "0.21"
MuladdMacro = "0.2"
PointNeighbors = "0.2.3"
Polyester = "0.7.5"
RecipesBase = "1"
Reexport = "1"
Expand All @@ -43,6 +46,5 @@ StaticArrays = "1"
StrideArrays = "0.1"
TimerOutputs = "0.5"
TrixiBase = "0.1"
PointNeighbors = "0.2"
WriteVTK = "1"
julia = "1.9"
6 changes: 4 additions & 2 deletions examples/fluid/dam_break_2d.jl
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,10 @@ boundary_system = BoundarySPHSystem(tank.boundary, boundary_model, adhesion_coef

# ==========================================================================================
# ==== Simulation
semi = Semidiscretization(fluid_system, boundary_system, threaded_nhs_update=true)
ode = semidiscretize(semi, tspan)
semi = Semidiscretization(fluid_system, boundary_system,
neighborhood_search=GridNeighborhoodSearch,
threaded_nhs_update=true)
ode = semidiscretize(semi, tspan, data_type=nothing)

info_callback = InfoCallback(interval=100)

Expand Down
4 changes: 3 additions & 1 deletion examples/n_body/n_body_system.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
using TrixiParticles
using LinearAlgebra

struct NBodySystem{NDIMS, ELTYPE <: Real} <: TrixiParticles.System{NDIMS, Nothing}
# The second type parameter of `System` can't be `Nothing`, or TrixiParticles will launch
# GPU kernel for `for_particle_neighbor` loops.
struct NBodySystem{NDIMS, ELTYPE <: Real} <: TrixiParticles.System{NDIMS, 0}
initial_condition :: InitialCondition{ELTYPE}
mass :: Array{ELTYPE, 1} # [particle]
G :: ELTYPE
Expand Down
6 changes: 5 additions & 1 deletion src/TrixiParticles.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ using DataFrames: DataFrame
using DiffEqCallbacks: PeriodicCallback, PeriodicCallbackAffect, PresetTimeCallback
using FastPow: @fastpow
using ForwardDiff: ForwardDiff
using GPUArrays: AbstractGPUArray
using JSON: JSON
using KernelAbstractions: KernelAbstractions, @kernel, @index
using LinearAlgebra: norm, dot, I, tr, inv, pinv, det
using MuladdMacro: @muladd
using Polyester: Polyester, @batch
Expand All @@ -26,7 +28,9 @@ using TrixiBase: trixi_include
using PointNeighbors: PointNeighbors, for_particle_neighbor
using WriteVTK: vtk_grid, MeshCell, VTKCellTypes, paraview_collection, vtk_save

# util needs to be first because of macro @trixi_timeit
# `util.jl` depends on the `GPUSystem` type defined in `system.jl`
include("general/system.jl")
# `util.jl` needs to be next because of the macros `@trixi_timeit` and `@threaded`
include("util.jl")
include("callbacks/callbacks.jl")
include("general/general.jl")
Expand Down
2 changes: 1 addition & 1 deletion src/general/corrections.jl
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@ function compute_gradient_correction_matrix!(corr_matrix::AbstractArray, system,
end

function correction_matrix_inversion_step!(corr_matrix, system)
@threaded for particle in eachparticle(system)
@threaded system for particle in eachparticle(system)
L = extract_smatrix(corr_matrix, system, particle)

# The matrix `L` only becomes singular when the particle and all neighbors
Expand Down
27 changes: 2 additions & 25 deletions src/general/general.jl
Original file line number Diff line number Diff line change
@@ -1,33 +1,10 @@
# Abstract supertype for all system types. We additionally store the type of the system's
# initial condition, which is `Nothing` when using KernelAbstractions.jl.
abstract type System{NDIMS, IC} end

# When using KernelAbstractions.jl, the initial condition has been replaced by `nothing`
GPUSystem = System{NDIMS, Nothing} where {NDIMS}

abstract type FluidSystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::FluidSystem) = "fluid"

abstract type SolidSystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::SolidSystem) = "solid"

abstract type BoundarySystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::BoundarySystem) = "boundary"

@inline function set_zero!(du)
du .= zero(eltype(du))

return du
end

# Note that `semidiscretization.jl` depends on the system types and has to be
# included later.
# Note that `system.jl` has already been included.
# `semidiscretization.jl` depends on the system types and has to be included later.
# `density_calculators.jl` needs to be included before `corrections.jl`.
include("density_calculators.jl")
include("corrections.jl")
include("smoothing_kernels.jl")
include("initial_condition.jl")
include("system.jl")
include("interpolation.jl")
include("file_system.jl")
include("custom_quantities.jl")
Expand Down
8 changes: 8 additions & 0 deletions src/general/gpu.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Adapt.@adapt_structure DensityDiffusionAntuono
Adapt.@adapt_structure BoundarySPHSystem
Adapt.@adapt_structure BoundaryModelDummyParticles
Adapt.@adapt_structure BoundaryModelMonaghanKajtar
Adapt.@adapt_structure TotalLagrangianSPHSystem

# The initial conditions are only used for initialization, which happens before `adapt`ing
# the semidiscretization, so we don't need to store `InitialCondition`s on the GPU.
Expand All @@ -32,3 +33,10 @@ end
function Adapt.adapt_structure(to::typeof(Array), range::UnitRange)
return range
end

KernelAbstractions.get_backend(::PtrArray) = KernelAbstractions.CPU()
KernelAbstractions.get_backend(system::System) = KernelAbstractions.get_backend(system.mass)

function KernelAbstractions.get_backend(system::BoundarySPHSystem)
KernelAbstractions.get_backend(system.coordinates)
end
11 changes: 11 additions & 0 deletions src/general/neighborhood_search.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,14 @@ function PointNeighbors.for_particle_neighbor(f, system, neighbor_system,
for_particle_neighbor(f, system_coords, neighbor_coords, neighborhood_search,
particles=particles, parallel=parallel)
end

function PointNeighbors.for_particle_neighbor(f, system::GPUSystem, neighbor_system,
system_coords, neighbor_coords,
neighborhood_search;
particles=each_moving_particle(system),
parallel=true)
@threaded system for particle in particles
PointNeighbors.foreach_neighbor(f, system_coords, neighbor_coords,
neighborhood_search, particle)
end
end
31 changes: 27 additions & 4 deletions src/general/semidiscretization.jl
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ end

# We have to pass `system` here for type stability,
# since the type of `system` determines the return type.
@inline function wrap_u(u_ode, system, semi)
@inline function wrap_u(u_ode::Array, system, semi)
(; ranges_u) = semi

range = ranges_u[system_indices(system, semi)]
Expand All @@ -364,7 +364,7 @@ end
(StaticInt(u_nvariables(system)), n_moving_particles(system)))
end

@inline function wrap_v(v_ode, system, semi)
@inline function wrap_v(v_ode::Array, system, semi)
(; ranges_v) = semi

range = ranges_v[system_indices(system, semi)]
Expand All @@ -375,6 +375,29 @@ end
(StaticInt(v_nvariables(system)), n_moving_particles(system)))
end

# For non-`Array`s (typically GPU arrays), just reshape. Calling the `PtrArray` code above
# for a `CuArray` yields another `CuArray` (instead of a `PtrArray`) and is 8 times slower
# with double the allocations.
@inline function wrap_u(u_ode, system, semi)
(; ranges_u) = semi

range = ranges_u[system_indices(system, semi)]

@boundscheck @assert length(range) == u_nvariables(system) * n_moving_particles(system)

return reshape(view(u_ode, range), (u_nvariables(system), n_moving_particles(system)))
end

@inline function wrap_v(v_ode, system, semi)
(; ranges_v) = semi

range = ranges_v[system_indices(system, semi)]

@boundscheck @assert length(range) == v_nvariables(system) * n_moving_particles(system)

return reshape(view(v_ode, range), (v_nvariables(system), n_moving_particles(system)))
end

efaulhaber marked this conversation as resolved.
Show resolved Hide resolved
function calculate_dt(v_ode, u_ode, cfl_number, semi::Semidiscretization)
(; systems) = semi

Expand All @@ -391,7 +414,7 @@ function drift!(du_ode, v_ode, u_ode, semi, t)
du = wrap_u(du_ode, system, semi)
v = wrap_v(v_ode, system, semi)

@threaded for particle in each_moving_particle(system)
@threaded system for particle in each_moving_particle(system)
# This can be dispatched per system
add_velocity!(du, v, particle, system)
end
Expand Down Expand Up @@ -490,7 +513,7 @@ function add_source_terms!(dv_ode, v_ode, u_ode, semi)
v = wrap_v(v_ode, system, semi)
u = wrap_u(u_ode, system, semi)

@threaded for particle in each_moving_particle(system)
@threaded system for particle in each_moving_particle(system)
# Dispatch by system type to exclude boundary systems
add_acceleration!(dv, particle, system)
add_source_terms_inner!(dv, v, u, particle, system, source_terms(system))
Expand Down
22 changes: 22 additions & 0 deletions src/general/system.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
# Abstract supertype for all system types. We additionally store the type of the system's
# initial condition, which is `Nothing` when using KernelAbstractions.jl.
abstract type System{NDIMS, IC} end

# When using KernelAbstractions.jl, the initial condition has been replaced by `nothing`
GPUSystem = System{NDIMS, Nothing} where {NDIMS}

abstract type FluidSystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::FluidSystem) = "fluid"

abstract type SolidSystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::SolidSystem) = "solid"

abstract type BoundarySystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::BoundarySystem) = "boundary"

@inline function set_zero!(du)
du .= zero(eltype(du))

return du
end

initialize!(system, neighborhood_search) = system

@inline Base.ndims(::System{NDIMS}) where {NDIMS} = NDIMS
Expand Down
7 changes: 4 additions & 3 deletions src/schemes/boundary/dummy_particles/dummy_particles.jl
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ function compute_pressure!(boundary_model, ::Union{SummationDensity, ContinuityD

# Limit pressure to be non-negative to avoid attractive forces between fluid and
# boundary particles at free surfaces (sticking artifacts).
@threaded for particle in eachparticle(system)
@threaded system for particle in eachparticle(system)
apply_state_equation!(boundary_model, particle_density(v, boundary_model,
particle), particle)
end
Expand Down Expand Up @@ -346,14 +346,15 @@ function compute_pressure!(boundary_model, ::AdamiPressureExtrapolation,
system_coords, neighbor_coords,
v_neighbor_system, nhs)
end
for particle in eachparticle(system)

@threaded system for particle in eachparticle(system)
# Limit pressure to be non-negative to avoid attractive forces between fluid and
# boundary particles at free surfaces (sticking artifacts).
pressure[particle] = max(pressure[particle], 0.0)
end
end

@trixi_timeit timer() "inverse state equation" @threaded for particle in eachparticle(system)
@trixi_timeit timer() "inverse state equation" @threaded system for particle in eachparticle(system)
compute_adami_density!(boundary_model, system, system_coords, particle)
end
end
Expand Down
2 changes: 1 addition & 1 deletion src/schemes/boundary/system.jl
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ function (movement::BoundaryMovement)(system, t)

is_moving(t) || return system

@threaded for particle in moving_particles
@threaded system for particle in moving_particles
pos_new = initial_coords(system, particle) + movement_function(t)
vel = ForwardDiff.derivative(movement_function, t)
acc = ForwardDiff.derivative(t_ -> ForwardDiff.derivative(movement_function, t_), t)
Expand Down
2 changes: 1 addition & 1 deletion src/schemes/fluid/weakly_compressible_sph/system.jl
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ function reinit_density!(system, v, u, v_ode, u_ode, semi)
end

function compute_pressure!(system, v)
@threaded for particle in eachparticle(system)
@threaded system for particle in eachparticle(system)
apply_state_equation!(system, particle_density(v, system, particle), particle)
end
end
Expand Down
Loading