hipCUB 3.2.0 for ROCm 6.2.0
Added
-
Add
DeviceCopy
function to have parity with CUB. -
In the rocPRIM backend, added
enum WarpExchangeAlgorithm
, which is used as the new optional template argument forWarpExchange
.- The potential values for the enum are
WARP_EXCHANGE_SMEM
andWARP_EXCHANGE_SHUFFLE
. WARP_EXCHANGE_SMEM
stands for the previous algorithm, whileWARP_EXCHANGE_SHUFFLE
performs the exchange via shuffle operations.WARP_EXCHANGE_SHUFFLE
does not require any pre-allocated shared memory, but theItemsPerThread
must be a divisor ofWarpSize
.
- The potential values for the enum are
-
Added
tuple.hpp
which defines templateshipcub::tuple
,hipcub::tuple_element
,hipcub::tuple_element_t
andhipcub::tuple_size
. -
Added new overloaded member functions to
BlockRadixSort
andDeviceRadixSort
that expose adecomposer
argument. Keys of a custom
type (key_type
) can be sorted via these overloads, if an appropriate decomposer is passed. The decomposer has to implement
operator(const key_type&)
which returns ahipcub::tuple
of references pointing to members ofkey_type
. -
On AMD GPUs (using the HIP backend), it is possible to issue hipCUB API calls inside of
hipGraphs, with several exceptions:- CachingDeviceAllocator
- GridBarrierLifetime
- DeviceSegmentedRadixSort
- DeviceRunLengthEncode
Currently, these classes rely on one or more synchronous calls to function correctly. Because of this, they cannot be used inside of hipGraphs.
Changed
- The NVIDIA backend now requires CUB, Thrust and libcu++ 2.2.0. If it is not found it will be downloaded from the NVIDIA CCCL repository.
Fixed
- Fixed the derivation for the accumulator type for device scan algorithms in the rocPRIM backend being different compared to CUB.
It now derives the accumulator type as the result of the binary operator. debug_synchronous
has been deprecated in hipCUB-2.13.2, and it no longer has any effect. With this release, passingdebug_synchronous
to the device functions results in a deprecation warning both at runtime and at compile time.- The synchronization that was previously achievable by passing
debug_synchronous=true
can now be achieved at compile time
by setting theCUB_DEBUG_SYNC
(or higher debug level) or theHIPCUB_DEBUG_SYNC
preprocessor definition. - The compile time deprecation warnings can be disabled by defining the
HIPCUB_IGNORE_DEPRECATED_API
preprocessor definition.
- The synchronization that was previously achievable by passing