Skip to content

Commit

Permalink
StreamHPC 2024-01-16 (#327)
Browse files Browse the repository at this point in the history
* refactor(device_scan.hpp): use hipCUB for sum by key instead of explicitly calling rocPRIM

* Add const qualifier to operator overloading methods of iterators

* fix(device_scan.hpp): derive internal accumulator type from scan operator instead of using the input iterator type

* docs(changelog.md): document the change in accumulator type for device scan

* style(device_scan.hpp): update copyright date

* style(device_scan.hpp): improve styling

* fix: use public function 'rocprim::invoke_result' instead of 'rocprim::detail::invoke_result'

* build: Update CUB/Thrust/libcu++ to 2.2.0

* build: Respect CMAKE_CUDA_FLAGS on the NVIDIA platform

Previously hipCUB would ignore CMAKE_CUDA_FLAGS specified by the user.
This was probably a workaround to the broken logic that would repeat
these flags multiple times when re-configuring the project.
I fixed the same issue in rocRAND by not running this code on subsequent
configurations. Use the same solution here too.

* build(test): Don't mark cmake generated source files to be cleaned

This resulted on them being deleted by `make/ninja clean`, and would not
be re-generated until the next configuration of cmake, resulting in
compilation errors when trying to build after the clean.
When the parallelization is changed this might result in some lose files
sticking around, but they should not cause harm.

* docs(CHANGELOG.md): Add update to CUB version to the changelog [CI skip]

* build: Fix NVCC warnings

Fix some NVCC and GCC warnings. Some of these were false positives, but
the resulting code is cleaner IMO, so it's not worth fighting against the
tools.
Some warnings remain in ROCm 6.0 coming from the HIP headers.

* fix: update github urls to point to new ROCm organization

* docs: fixed changelog style

* ci: Specifying SPHINX_DIR

* build: Updated version in CMakeLists and Changelog

---------

Co-authored-by: Nara Prasetya <[email protected]>
Co-authored-by: Jaap <[email protected]>
Co-authored-by: Gergely Mészáros <[email protected]>
  • Loading branch information
4 people authored Mar 19, 2024
1 parent 5c34150 commit 6cb5938
Show file tree
Hide file tree
Showing 23 changed files with 272 additions and 215 deletions.
2 changes: 2 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,8 @@ test:nvcc_install:

test:doc:
stage: test
variables:
SPHINX_DIR: $DOCS_DIR/sphinx
extends:
- .rules:test
- .build:docs
Expand Down
2 changes: 1 addition & 1 deletion .jenkins/precheckin.groovy
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env groovy
// This shared library is available at https://github.com/ROCmSoftwarePlatform/rocJENKINS/
// This shared library is available at https://github.com/ROCm/rocJENKINS/
@Library('rocJenkins@pong') _

// This file is for internal AMD use.
Expand Down
2 changes: 1 addition & 1 deletion .jenkins/staticanalysis.groovy
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env groovy
// This shared library is available at https://github.com/ROCmSoftwarePlatform/rocJENKINS/
// This shared library is available at https://github.com/ROCm/rocJENKINS/
@Library('rocJenkins@pong') _

// This is file for internal AMD use.
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@
Documentation for hipCUB is available at
[https://rocm.docs.amd.com/projects/hipCUB/en/latest/](https://rocm.docs.amd.com/projects/hipCUB/en/latest/).

## (Unreleased) hipCUB-3.2.0 for ROCm 6.2.0

### Fixed

* Fixed the derivation for the accumulator type for device scan algorithms in the rocPRIM backend being different compared to CUB.
It now derives the accumulator type as the result of the binary operator.
* The NVIDIA backend now requires CUB, Thrust and libcu++ 2.2.0. If it is not found it will be downloaded from the NVIDIA CCCL repository.

## (Unreleased) hipCUB-3.1.0 for ROCm 6.1.0

### Changes
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ if(BUILD_ADDRESS_SANITIZER)
endif()

# Setup VERSION
set(VERSION_STRING "3.0.0")
set(VERSION_STRING "3.2.0")
rocm_setup_version(VERSION ${VERSION_STRING})

# Print configuration summary
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ python3 -m http.server
* The [rocPRIM](https://github.com/ROCm/rocPRIM) library
* Automatically downloaded and built by the CMake script
* Requires CMake 3.16.9 or later

* For NVIDIA GPUs:
* CUDA Toolkit
* CUB library
Expand Down
4 changes: 1 addition & 3 deletions benchmark/benchmark_device_segmented_reduce.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// MIT License
//
// Copyright (c) 2020 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2020-2024 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -90,8 +90,6 @@ void run_benchmark(benchmark::State& state,
OutputT * d_aggregates_output;
HIP_CHECK(hipMalloc(&d_aggregates_output, segments_count * sizeof(OutputT)));

hipcub::Sum reduce_op;

void * d_temporary_storage = nullptr;
size_t temporary_storage_bytes = 0;

Expand Down
8 changes: 4 additions & 4 deletions benchmark/benchmark_device_select.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// MIT License
//
// Copyright (c) 2020 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2020-2024 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -386,8 +386,8 @@ void run_unique_by_key_benchmark(benchmark::State& state,
}
}

const auto input_values = benchmark_utils::get_random_data<ValueT>(size, ValueT(-1000), ValueT(1000));
unsigned int selected_count_output = 0;
const auto input_values
= benchmark_utils::get_random_data<ValueT>(size, ValueT(-1000), ValueT(1000));

KeyT* d_keys_input;
ValueT* d_values_input;
Expand All @@ -399,7 +399,7 @@ void run_unique_by_key_benchmark(benchmark::State& state,
HIP_CHECK(hipMalloc(&d_values_input, input_values.size() * sizeof(input_values[0])));
HIP_CHECK(hipMalloc(&d_keys_output, input_keys.size() * sizeof(input_keys[0])));
HIP_CHECK(hipMalloc(&d_values_output, input_values.size() * sizeof(input_values[0])));
HIP_CHECK(hipMalloc(&d_selected_count_output, sizeof(selected_count_output)));
HIP_CHECK(hipMalloc(&d_selected_count_output, sizeof(*d_selected_count_output)));

HIP_CHECK(
hipMemcpy(
Expand Down
83 changes: 35 additions & 48 deletions cmake/Dependencies.cmake
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2017-2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -139,62 +139,49 @@ endif(USER_BUILD_BENCHMARK)

# CUB (only for CUDA platform)
if(HIP_COMPILER STREQUAL "nvcc")

set(CCCL_MINIMUM_VERSION 2.2.0)
if(NOT DOWNLOAD_CUB)
find_package(cub QUIET)
find_package(thrust QUIET)
find_package(CUB ${CCCL_MINIMUM_VERSION} CONFIG)
find_package(Thrust ${CCCL_MINIMUM_VERSION} CONFIG)
find_package(libcudacxx ${CCCL_MINIMUM_VERSION} CONFIG)
endif()

if(NOT DEFINED CUB_INCLUDE_DIR)
file(
DOWNLOAD https://github.com/NVIDIA/cub/archive/2.1.0.zip
${CMAKE_CURRENT_BINARY_DIR}/cub-2.1.0.zip
STATUS cub_download_status LOG cub_download_log
)
list(GET cub_download_status 0 cub_download_error_code)
if(cub_download_error_code)
message(FATAL_ERROR "Error: downloading "
"https://github.com/NVIDIA/cub/archive/2.1.0.zip failed "
"error_code: ${cub_download_error_code} "
"log: ${cub_download_log} "
)
if (NOT CUB_FOUND OR NOT Thrust_FOUND OR NOT libcudacxx_FOUND)
if(CUB_FOUND OR Thrust_FOUND OR libcudacxx_FOUND)
message(WARNING "Found one of CUB, Thrust or libcu++, but not all of them.
This can lead to mixing different potentially incompatible versions.")
endif()

execute_process(
COMMAND ${CMAKE_COMMAND} -E tar xzf ${CMAKE_CURRENT_BINARY_DIR}/cub-2.1.0.zip
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
RESULT_VARIABLE cub_unpack_error_code
)
if(cub_unpack_error_code)
message(FATAL_ERROR "Error: unpacking ${CMAKE_CURRENT_BINARY_DIR}/cub-2.1.0.zip failed")
endif()
set(CUB_INCLUDE_DIR ${CMAKE_CURRENT_BINARY_DIR}/cub-2.1.0/ CACHE PATH "")
endif()
message(STATUS "CUB, Thrust or libcu++ not found, downloading and extracting CCCL ${CCCL_MINIMUM_VERSION}")
file(DOWNLOAD https://github.com/NVIDIA/cccl/archive/refs/tags/v${CCCL_MINIMUM_VERSION}.zip
${CMAKE_CURRENT_BINARY_DIR}/cccl-${CCCL_MINIMUM_VERSION}.zip
STATUS cccl_download_status LOG cccl_download_log)

if(NOT DEFINED THRUST_INCLUDE_DIR)
file(
DOWNLOAD https://github.com/NVIDIA/thrust/archive/2.1.0.zip
${CMAKE_CURRENT_BINARY_DIR}/thrust-2.1.0.zip
STATUS thrust_download_status LOG thrust_download_log
)
list(GET thrust_download_status 0 thrust_download_error_code)
if(thrust_download_error_code)
list(GET cccl_download_status 0 cccl_download_error_code)
if(cccl_download_error_code)
message(FATAL_ERROR "Error: downloading "
"https://github.com/NVIDIA/thrust/archive/2.1.0.zip failed "
"error_code: ${thrust_download_error_code} "
"log: ${thrust_download_log} "
)
"https://github.com/NVIDIA/cccl/archive/refs/tags/v${CCCL_MINIMUM_VERSION}.zip failed "
"error_code: ${cccl_download_error_code} "
"log: ${cccl_download_log}")
endif()

execute_process(
COMMAND ${CMAKE_COMMAND} -E tar xzf ${CMAKE_CURRENT_BINARY_DIR}/thrust-2.1.0.zip
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
RESULT_VARIABLE thrust_unpack_error_code
)
if(thrust_unpack_error_code)
message(FATAL_ERROR "Error: unpacking ${CMAKE_CURRENT_BINARY_DIR}/thrust-2.1.0.zip failed")
if (CMAKE_VERSION VERSION_GREATER_EQUAL 3.18)
file(ARCHIVE_EXTRACT INPUT ${CMAKE_CURRENT_BINARY_DIR}/cccl-${CCCL_MINIMUM_VERSION}.zip)
else()
execute_process(COMMAND "${CMAKE_COMMAND}" -E tar xf ${CMAKE_CURRENT_BINARY_DIR}/cccl-${CCCL_MINIMUM_VERSION}.zip
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
RESULT_VARIABLE cccl_unpack_error_code)
if(cccl_unpack_error_code)
message(FATAL_ERROR "Error: unpacking ${CMAKE_CURRENT_BINARY_DIR}/cccl-${CCCL_MINIMUM_VERSION}.zip failed")
endif()
endif()
set(THRUST_INCLUDE_DIR ${CMAKE_CURRENT_BINARY_DIR}/thrust-2.1.0/ CACHE PATH "")

find_package(CUB ${CCCL_MINIMUM_VERSION} CONFIG REQUIRED NO_DEFAULT_PATH
PATHS ${CMAKE_CURRENT_BINARY_DIR}/cccl-${CCCL_MINIMUM_VERSION}/cub)
find_package(Thrust ${CCCL_MINIMUM_VERSION} CONFIG REQUIRED NO_DEFAULT_PATH
PATHS ${CMAKE_CURRENT_BINARY_DIR}/cccl-${CCCL_MINIMUM_VERSION}/thrust)
find_package(libcudacxx ${CCCL_MINIMUM_VERSION} CONFIG REQUIRED NO_DEFAULT_PATH
PATHS ${CMAKE_CURRENT_BINARY_DIR}/cccl-${CCCL_MINIMUM_VERSION}/libcudacxx)
endif()
else()
# rocPRIM (only for ROCm platform)
Expand All @@ -206,7 +193,7 @@ else()
message(STATUS "rocPRIM not found. Fetching...")
FetchContent_Declare(
prim
GIT_REPOSITORY https://github.com/ROCmSoftwarePlatform/rocPRIM.git
GIT_REPOSITORY https://github.com/ROCm/rocPRIM.git
GIT_TAG develop
)
FetchContent_MakeAvailable(prim)
Expand Down
40 changes: 22 additions & 18 deletions cmake/SetupNVCC.cmake
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -98,26 +98,30 @@ set(NVGPU_TARGETS "${DEFAULT_NVGPU_TARGETS}"
CACHE STRING "List of NVIDIA GPU targets (compute capabilities), for example \"35;50\""
)
set(CMAKE_CUDA_ARCHITECTURES ${NVGPU_TARGETS})
# Generate compiler flags based on targeted CUDA architectures if CMake doesn't. (Controlled by policy CP0104, on by default after 3.18)
if(CMAKE_VERSION VERSION_LESS "3.18")
foreach(CUDA_ARCH ${NVGPU_TARGETS})
list(APPEND HIP_NVCC_FLAGS "--generate-code arch=compute_${CUDA_ARCH},code=sm_${CUDA_ARCH} ")
list(APPEND HIP_NVCC_FLAGS "--generate-code arch=compute_${CUDA_ARCH},code=compute_${CUDA_ARCH} ")
endforeach()
endif()

execute_process(
COMMAND ${HIP_HIPCONFIG_EXECUTABLE} --cpp_config
OUTPUT_VARIABLE HIP_CPP_CONFIG_FLAGS
OUTPUT_STRIP_TRAILING_WHITESPACE
ERROR_STRIP_TRAILING_WHITESPACE
)
if (NOT _HIPCUB_HIP_NVCC_FLAGS_SET)
execute_process(
COMMAND ${HIP_HIPCONFIG_EXECUTABLE} --cpp_config
OUTPUT_VARIABLE HIP_CPP_CONFIG_FLAGS
OUTPUT_STRIP_TRAILING_WHITESPACE
ERROR_STRIP_TRAILING_WHITESPACE
)

# Update list parameter
string(REPLACE ";" " " HIP_NVCC_FLAGS ${HIP_NVCC_FLAGS})
# Generate compiler flags based on targeted CUDA architectures if CMake doesn't. (Controlled by policy CP0104, on by default after 3.18)
if(CMAKE_VERSION VERSION_LESS "3.18")
foreach(CUDA_ARCH ${NVGPU_TARGETS})
list(APPEND HIP_NVCC_FLAGS "--generate-code" "arch=compute_${CUDA_ARCH},code=sm_${CUDA_ARCH}")
list(APPEND HIP_NVCC_FLAGS "--generate-code" "arch=compute_${CUDA_ARCH},code=compute_${CUDA_ARCH}")
endforeach()
endif()

# Update list parameter
list(JOIN HIP_NVCC_FLAGS " " HIP_NVCC_FLAGS)

set(CMAKE_CUDA_FLAGS "${HIP_CPP_CONFIG_FLAGS} ${HIP_NVCC_FLAGS}"
CACHE STRING "Cuda compile flags" FORCE)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} ${HIP_CPP_CONFIG_FLAGS} ${HIP_NVCC_FLAGS}"
CACHE STRING "Cuda compile flags" FORCE)
set(_HIPCUB_HIP_NVCC_FLAGS_SET ON CACHE INTERNAL "")
endif()

# Ignore warnings about #pragma unroll
# and about deprecated CUDA function(s) used in hip/nvcc_detail/hip_runtime_api.h
Expand Down
10 changes: 2 additions & 8 deletions hipcub/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -61,13 +61,7 @@ else()
# hipcub_cub target is only for internal use.
add_library(hipcub_cub INTERFACE)
target_link_libraries(hipcub_cub
INTERFACE
hipcub
)
target_include_directories(hipcub_cub
INTERFACE
${CUB_INCLUDE_DIR}
${THRUST_INCLUDE_DIR}
INTERFACE hipcub CUB::CUB Thrust::Thrust libcudacxx::libcudacxx
)
endif()

Expand Down
Loading

0 comments on commit 6cb5938

Please sign in to comment.