diff --git a/docs/api-reference/data-type-support.rst b/docs/api-reference/data-type-support.rst index e306c148..47f01cf9 100644 --- a/docs/api-reference/data-type-support.rst +++ b/docs/api-reference/data-type-support.rst @@ -8,57 +8,16 @@ Data type support ****************************************** -The input and output data types supported by hipCUB are listed here: +hipCUB supports the following data types on both ROCm and CUDA: - .. list-table:: Supported Input/Output Types - :header-rows: 1 - :name: supported-input-output-types +* ``int8`` +* ``int16`` +* ``int32`` +* ``float32`` +* ``float64`` - * - - Input/Output Types - - AMD Support - - CUDA Support - * - - int8 - - ✅ - - ✅ - * - - float8 - - ❌ - - ❌ - * - - bfloat8 - - ❌ - - ❌ - * - - int16 - - ✅ - - ✅ - * - - float16 - - ✅ - - ✅ [#]_ - * - - bfloat16 - - ✅ - - ✅ [#]_ - * - - int32 - - ✅ - - ✅ - * - - tensorfloat32 - - ❌ - - ❌ - * - - float32 - - ✅ - - ✅ - * - - float64 - - ✅ - - ✅ +``float8``, ``bfloat8``, and ``tensorfloat32`` are not supported by hipCUB on neither ROCm nor CUDA. -.. rubric:: Footnotes -.. [#] NVIDIA backend can't handle ``float16`` with the following API calls: ``block_adjacent_difference``, ``device_adjacenet_difference``, ``device_reduce``, ``device_scan``, ``device_segmented_reduce`` and ``device_select``. -.. [#] NVIDIA backend can't handle ``bfloat16`` with the following API calls: ``block_adjacent_difference``, ``device_adjacenet_difference``, ``device_reduce``, ``device_scan``, ``device_segmented_reduce``, ``device_select`` and ``device_histogram``. +The NVIDIA back end does not support ``float16`` nor ``bfloat16`` with the following API calls: ``block_adjacent_difference``, ``device_adjacent_difference``, ``device_reduce``, ``device_scan``, ``device_segmented_reduce`` and ``device_select``. + +The NVIDIA backend also does not support ``bfloat16`` with ``device_histogram``. diff --git a/docs/index.rst b/docs/index.rst index ef1d4262..8e4f96eb 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,16 +9,20 @@ hipCUB documentation =========================== -hipCUB is a thin header-only wrapper library on top of rocPRIM or CUB. It enables developers to port project -using CUB library to the `HIP `_ layer and to run them -on AMD hardware. To learn more, see :ref:`what-is-hipcub` +hipCUB is a thin, header-only wrapper library for `rocPRIM `_ and `CUB `_. It enables developers to port projects +using the CUB library to the `HIP `_ layer and run on AMD hardware. To learn more, see :ref:`what-is-hipcub` -You can access hipCUB code on our `GitHub repository `_. - -The documentation is structured as follows: +The hipCUB repository is located at `https://github.com/ROCm/hipCUB `_. .. grid:: 2 + .. grid-item-card:: Installation + + * :doc:`Prerequisites ` + * :doc:`Installation overview ` + * :doc:`Installing on Windows ` + * :doc:`Installing on Linux and Windows with CMake ` + .. grid-item-card:: API Reference * :ref:`data-type-support` diff --git a/docs/install/hipCUB-install-on-Windows.rst b/docs/install/hipCUB-install-on-Windows.rst new file mode 100644 index 00000000..76375abe --- /dev/null +++ b/docs/install/hipCUB-install-on-Windows.rst @@ -0,0 +1,32 @@ +.. meta:: + :description: Build and install hipCUB with rmake.py + :keywords: install, building, hipCUB, AMD, ROCm, source code, installation script, Windows + +******************************************************************** +Building and installing hipCUB on Windows +******************************************************************** + +You can use ``rmake.py`` to build and install hipCUB on Microsoft Windows. You can also use `CMake <./hipCUB-install-with-cmake.html>`_ if you want more build and installation options. + + +``rmake.py`` is located in the ``hipCUB`` root directory. To build and install hipCUB with ``rmake.py``, run: + +.. code:: shell + + python rmake.py -i + +This command also downloads `rocPRIM `_ and installs it in ``C:\hipSDK``. + +The ``-c`` option builds all clients, including the unit tests: + +.. code:: shell + + python rmake.py -c + +To see a complete list of ``rmake.py`` options, run: + +.. code-block:: shell + + python rmake.py --help + + \ No newline at end of file diff --git a/docs/install/hipCUB-install-overview.rst b/docs/install/hipCUB-install-overview.rst new file mode 100644 index 00000000..772e53d0 --- /dev/null +++ b/docs/install/hipCUB-install-overview.rst @@ -0,0 +1,23 @@ +.. meta:: + :description: hipCUB installation overview + :keywords: install, hipCUB, AMD, ROCm, installation, overview, general + +********************************* +hipCUB installation overview +********************************* + +The hipCUB source code is available from the `hipCUB GitHub Repository `_. + +The develop branch is the default branch. The develop branch is intended for users who want to preview new features or contribute to the hipCUB code base. + +If you don't intend to contribute to the hipCUB code base and won't be previewing features, use a branch that matches the version of ROCm installed on your system. + +hipCUB can be built and installed with |rmake|_ on Windows, or `CMake <./hipCUB-install-with-cmake.html>`_ on both Windows and Linux. + +.. |install| replace:: ``install`` +.. _install: ./rocThrust-install-script.html + +.. |rmake| replace:: ``rmake.py`` +.. _rmake: ./hipCUB-install-on-Windows.html + +CMake provides the most flexibility in building and installing hipCUB. \ No newline at end of file diff --git a/docs/install/hipCUB-install-with-cmake.rst b/docs/install/hipCUB-install-with-cmake.rst new file mode 100644 index 00000000..f21bf0db --- /dev/null +++ b/docs/install/hipCUB-install-with-cmake.rst @@ -0,0 +1,56 @@ +.. meta:: + :description: Build and install hipCUB with CMake + :keywords: install, building, hipCUB, AMD, ROCm, source code, cmake + +.. _install-with-cmake: + +******************************************************************** +Building and installing hipCUB with CMake +******************************************************************** + +You can build and install hipCUB with CMake on AMD and NVIDIA GPUs on Windows or Linux. + +Before you begin, set ``CXX`` to ``amdclang++`` or ``hipcc`` if you're building hipCUB on an AMD GPU, or to ``nvcc`` if you're building hipCUB on an NVIDIA GPU. Then set ``CMAKE_CXX_COMPILER`` to the compiler's absolute path. For example: + +.. code:: shell + + CXX=amdclang++ + CMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ + +Create the ``build`` directory inside the ``hipCUB`` directory, then change directory to the ``build`` directory: + +.. code:: shell + + mkdir build + cd build + +Generate the makefile using the ``cmake`` command: + +.. code:: shell + + cmake ../. [-D [-D] ...] + +The available build options are: + + +* ``BUILD_BENCHMARK``. Set this to ``ON`` to build benchmark tests. Off by default. +* ``BUILD_TEST``. Set this to ``ON`` to build tests. Off by default. +* ``DEPENDENCIES_FORCE_DOWNLOAD``. Set this to ``ON`` to download the dependencies regardless of whether or not they are already installed. Off by default. + +Build hipCUB using the generated make file: + +.. code:: shell + + make -j4 + +After you've built hipCUB, you can optionally generate tar, zip, and deb packages: + +.. code:: shell + + make package + +Finally, install hipCUB: + +.. code:: shell + + make install diff --git a/docs/install/hipCUB-prerequisites.rst b/docs/install/hipCUB-prerequisites.rst new file mode 100644 index 00000000..881cef31 --- /dev/null +++ b/docs/install/hipCUB-prerequisites.rst @@ -0,0 +1,35 @@ +.. meta:: + :description: hipCUB Installation Prerequisites + :keywords: install, hipCUB, AMD, ROCm, prerequisites, dependencies, requirements + +******************************************************************** +hipCUB prerequisites +******************************************************************** + +hipCUB has the following prerequisites on all platforms: + +* `CMake `_ version 3.16 or higher + +On AMD GPUs: + +* `ROCm `_ +* `amdclang++ `_ +* `rocPRIM `_ + +amdclang++ is installed with ROCm. rocPRIM is automatically downloaded and installed by the CMake script. + +On NVIDIA GPUs: + +* The CUDA Toolkit +* CCCL library version 2.3.2 or later +* CUB and Thrust +* libcu++ version 2.2.0 + +The CCCL library is automatically downloaded and built by the CMake script. If libcu++ isn't found on the system, it will be downloaded from the CCCL repository. + +On Microsoft Windows: + + +* Python verion 3.6 or later +* Visual Studio 2019 with Clang support +* Strawberry Perl diff --git a/docs/introduction.rst b/docs/introduction.rst deleted file mode 100644 index 60ece9b0..00000000 --- a/docs/introduction.rst +++ /dev/null @@ -1,46 +0,0 @@ - -************* -Introduction -************* - -.. toctree:: - :maxdepth: 4 - :caption: Contents: - -Overview -================== - -hipCUB is a thin wrapper library on top of rocPRIM or CUB. It enables developers to port project -using CUB library to the `HIP `_ layer and to run them -on AMD hardware. In the `ROCm `_ environment, hipCUB uses -rocPRIM library as the backend, however, on CUDA platforms it uses CUB instead. - -- When using hipCUB you should only include ```` header. -- When rocPRIM is used as backend ``HIPCUB_ROCPRIM_API`` is defined. -- When CUB is used as backend ``HIPCUB_CUB_API`` is defined. -- Backends are automaticaly selected based on platform detected by HIP layer - (``__HIP_PLATFORM_AMD__``, ``__HIP_PLATFORM_NVIDIA__``). - -rocPRIM backend -==================================== - -hipCUB with rocPRIM backend may not support all function and features CUB has because of the -differences between ROCm (HIP) platform and CUDA platform. - -Not-supported features and differences: - -- Functions, classes and macros which are not in the public API or not documented are not - supported. -- Device-wide primitives can't be called from kernels (dynamic parallelism is not supported in HIP - on ROCm). -- Storage management and debug functions: - - - ``Debug``, ``PtxVersion``, ``SmVersion`` functions and ``CubDebug``, ``CubDebugExit``, - ``_CubLog`` macros are not supported. -- Intrinsics: - - - ``ThreadExit``, ``ThreadTrap`` - not supported. - - Warp thread masks (when used) are 64-bit unsigned integers. - - ``member_mask`` input argument is ignored in ``WARP_*`` functions. - - Arguments ``first_thread``, ``last_thread``, and ``member_mask`` are ignored in ``Shuffle*`` - functions. diff --git a/docs/introduction.rst.orig b/docs/introduction.rst.orig deleted file mode 100644 index 60ece9b0..00000000 --- a/docs/introduction.rst.orig +++ /dev/null @@ -1,46 +0,0 @@ - -************* -Introduction -************* - -.. toctree:: - :maxdepth: 4 - :caption: Contents: - -Overview -================== - -hipCUB is a thin wrapper library on top of rocPRIM or CUB. It enables developers to port project -using CUB library to the `HIP `_ layer and to run them -on AMD hardware. In the `ROCm `_ environment, hipCUB uses -rocPRIM library as the backend, however, on CUDA platforms it uses CUB instead. - -- When using hipCUB you should only include ```` header. -- When rocPRIM is used as backend ``HIPCUB_ROCPRIM_API`` is defined. -- When CUB is used as backend ``HIPCUB_CUB_API`` is defined. -- Backends are automaticaly selected based on platform detected by HIP layer - (``__HIP_PLATFORM_AMD__``, ``__HIP_PLATFORM_NVIDIA__``). - -rocPRIM backend -==================================== - -hipCUB with rocPRIM backend may not support all function and features CUB has because of the -differences between ROCm (HIP) platform and CUDA platform. - -Not-supported features and differences: - -- Functions, classes and macros which are not in the public API or not documented are not - supported. -- Device-wide primitives can't be called from kernels (dynamic parallelism is not supported in HIP - on ROCm). -- Storage management and debug functions: - - - ``Debug``, ``PtxVersion``, ``SmVersion`` functions and ``CubDebug``, ``CubDebugExit``, - ``_CubLog`` macros are not supported. -- Intrinsics: - - - ``ThreadExit``, ``ThreadTrap`` - not supported. - - Warp thread masks (when used) are 64-bit unsigned integers. - - ``member_mask`` input argument is ignored in ``WARP_*`` functions. - - Arguments ``first_thread``, ``last_thread``, and ``member_mask`` are ignored in ``Shuffle*`` - functions. diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 79b81038..aac506db 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -4,6 +4,16 @@ root: index subtrees: - entries: - file: what-is-hipcub + - caption: Installation + entries: + - file: install/hipCUB-prerequisites + title: Installation prerequisites + - file: install/hipCUB-install-overview + title: Installation overview + - file: install/hipCUB-install-on-Windows + title: Installing on Windows + - file: install/hipCUB-install-with-cmake + title: Installing on Linux and Windows with CMake - caption: API reference entries: - file: api-reference/data-type-support diff --git a/docs/what-is-hipcub.rst b/docs/what-is-hipcub.rst index 7c48d04b..bf8ea3f4 100644 --- a/docs/what-is-hipcub.rst +++ b/docs/what-is-hipcub.rst @@ -9,10 +9,8 @@ What is hipCUB? ***************** -hipCUB is a thin header-only wrapper library on top of rocPRIM or CUB. It enables developers to port project -using CUB library to the `HIP `_ layer and to run them -on AMD hardware. In the `ROCm `_ environment, hipCUB uses -rocPRIM library as the backend, while on CUDA platforms it uses CUB. +hipCUB is a thin, header-only wrapper library for `rocPRIM `_ and `CUB `_. It enables developers to port projects +using the CUB library to the `HIP `_ layer and run on AMD hardware. Here are some key points to be noted: