diff --git a/README.md b/README.md
index 575fdd061a..302d5a3eec 100644
--- a/README.md
+++ b/README.md
@@ -14,24 +14,27 @@ New projects can be developed directly in the portable HIP C++ language and can
## DISCLAIMER
-The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
+The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
© 2021 Advanced Micro Devices, Inc. All Rights Reserved.
## Repository branches:
-The HIP repository maintains several branches. The branches that are of importance are:
+On Linux, the HIP open source repository maintains several branches. The branches that are of importance are:
* develop branch: This is the default branch, on which the new features are still under development and visible. While this maybe of interest to many, it should be noted that this branch and the features under development might not be stable.
* Main branch: This is the stable branch. It is up to date with the latest release branch, for example, if the latest HIP release is rocm-4.3, main branch will be the repository based on this release.
* Release branches. These are branches corresponding to each ROCM release, listed with release tags, such as rocm-4.2, rocm-4.3, etc.
-## Release tagging:
+On Windows, however, HIP doesn't have open source.
-HIP releases are typically naming convention for each ROCM release to help differentiate them.
+## Release tagging:
+On Linux, HIP releases are typically naming convention for each ROCM release to help differentiate them.
* rocm x.yy: These are the stable releases based on the ROCM release.
- This type of release is typically made once a month.*
+ This type of release is typically made once a month.
+
+On Windows, HIP is one part of HIP SDK package, aligns with each SDK software release.
## More Info:
- [Installation](INSTALL.md)
@@ -109,21 +112,23 @@ vector_square(T *C_d, const T *A_d, size_t N)
The HIP Runtime API code and compute kernel definition can exist in the same source file - HIP takes care of generating host and device code appropriately.
## HIP Portability and Compiler Technology
-HIP C++ code can be compiled with either,
-- On the NVIDIA CUDA platform, HIP provides header file which translate from the HIP runtime APIs to CUDA runtime APIs. The header file contains mostly inlined
+HIP open source C++ code can be compiled with either,
+- On the NVIDIA CUDA platform
+ HIP provides header file which translate from the HIP runtime APIs to CUDA runtime APIs. The header file contains mostly inlined
functions and thus has very low overhead - developers coding in HIP should expect the same performance as coding in native CUDA. The code is then
- compiled with nvcc, the standard C++ compiler provided with the CUDA SDK. Developers can use any tools supported by the CUDA SDK including the CUDA
- profiler and debugger.
-- On the AMD ROCm platform, HIP provides a header and runtime library built on top of HIP-Clang compiler. The HIP runtime implements HIP streams, events, and memory APIs,
- and is a object library that is linked with the application. The source code for all headers and the library implementation is available on GitHub.
- HIP developers on ROCm can use AMD's ROCgdb (https://github.com/ROCm-Developer-Tools/ROCgdb) for debugging and profiling.
+ compiled with nvcc, the standard C++ compiler provided with the CUDA SDK. Developers can use any tools supported by the CUDA SDK including the CUDA profiler and debugger.
+- On the AMD platform
+ On Linux, HIP provides a header and runtime library built on top of HIP-Clang compiler. The HIP runtime implements HIP streams, events, and memory APIs, and is a object library that is linked with the application.
+ On Linux, The source code for all headers and the library implementation is available on GitHub. HIP developers on ROCm Linux can use AMD's ROCgdb (https://github.com/ROCm-Developer-Tools/ROCgdb) for debugging and profiling.
+
+ On Windows, developers can install HIP SDK and implement their own applications via calling HIP APIs on any C++ development tools, like Microsoft Visual Studio.
Thus HIP source code can be compiled to run on either platform. Platform-specific features can be isolated to a specific platform using conditional compilation. Thus HIP
provides source portability to either platform. HIP provides the _hipcc_ compiler driver which will call the appropriate toolchain depending on the desired platform.
## Examples and Getting Started:
-
+On Linux open source,
* A sample and [blog](https://github.com/ROCm-Developer-Tools/HIP/tree/main/samples/0_Intro/square) that uses any of [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/master/README.md) tools to convert a simple app from CUDA to HIP:
@@ -136,7 +141,7 @@ cd samples/01_Intro/square
## More Examples
-The GitHub repository [HIP-Examples](https://github.com/ROCm-Developer-Tools/HIP-Examples.git) contains a hipified version of the popular Rodinia benchmark suite.
+On Linux open source, the GitHub repository [HIP-Examples](https://github.com/ROCm-Developer-Tools/HIP-Examples.git) contains a hipified version of the popular Rodinia benchmark suite.
The README with the procedures and tips the team used during this porting effort is here: [Rodinia Porting Guide](https://github.com/ROCm-Developer-Tools/HIP-Examples/blob/master/rodinia_3.0/hip/README.hip_porting)
## Tour of the HIP Directories
diff --git a/docs/markdown/hip_debugging.md b/docs/markdown/hip_debugging.md
index c6e857a90f..050fe3660a 100644
--- a/docs/markdown/hip_debugging.md
+++ b/docs/markdown/hip_debugging.md
@@ -13,7 +13,7 @@ Table of Contents
* [Kernel Enqueue Serialization](#kernel-enqueue-serialization)
* [Making Device visible](#making-device-visible)
* [Dump code object](#dump-code-object)
- * [HSA related environment variables](#HSA-related-environment-variables)
+ * [HSA related environment variables on Linux](#HSA-related-environment-variables-on-linux)
* [ General Debugging Tips](#general-debugging-tips)
## Debugging tools
@@ -127,11 +127,11 @@ Breakpoint 1, main ()
```
### Other Debugging Tools
-There are also other debugging tools available online developers can google and choose the one best suits the debugging requirements.
+There are also other debugging tools available online developers can google and choose the one best suits the debugging requirements. For example, Microsoft Visual Studio and Windgb tools are options on Windows.
## Debugging HIP Applications
-Below is an example to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.
+Below is an example on Linux to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.
```
test: simpleTest2> numElements=4194304 sizeElements=4194304 bytes
@@ -191,11 +191,14 @@ Thread 1 "hipMemcpy_simpl" received signal SIGSEGV, Segmentation fault.
...
```
+On Windows, debugging HIP applications on IDE like Microsoft Visual Studio tools, are more informative and visible to debug codes, inspect variables, watch multiple details and examine the call stacks.
+
## Useful Environment Variables
-HIP provides some environment variables which allow HIP, hip-clang, or HSA driver to disable some feature or optimization.
+
+HIP provides some environment variables which allow HIP, hip-clang, or HSA driver on Linux to disable some feature or optimization.
These are not intended for production but can be useful diagnose synchronization problems in the application (or driver).
-Some of the most useful environment variables are described here. They are supported on the ROCm path.
+Some of the most useful environment variables are described here. They are supported on the ROCm path on Linux and Windows as well.
### Kernel Enqueue Serialization
Developers can control kernel command serialization from the host using the environment variable,
@@ -236,8 +239,8 @@ if (totalDeviceNum > 2) {
Developers can dump code object to analyze compiler related issues via setting environment variable,
GPU_DUMP_CODE_OBJECT
-### HSA related environment variables
-HSA provides some environment variables help to analyze issues in driver or hardware, for example,
+### HSA related environment variables on Linux
+On Linux with open source, HSA provides some environment variables help to analyze issues in driver or hardware, for example,
HSA_ENABLE_SDMA=0
It causes host-to-device and device-to-host copies to use compute shader blit kernels rather than the dedicated DMA copy engines.
@@ -250,23 +253,23 @@ This environment variable can be useful to diagnose interrupt storm issues in th
### Summary of environment variables in HIP
-The following is the summary of the most useful environment variables in HIP.
+The following is the summary of the most useful environment variables in HIP supporting on Linux and Windows.
-| **Environment variable** | **Default value** | **Usage** |
-| ---------------------------------------------------------------------------------------------------------------| ----------------- | --------- |
-| AMD_LOG_LEVEL
Enable HIP log on different Level. | 0 | 0: Disable log.
1: Enable log on error level.
2: Enable log on warning and below levels.
0x3: Enable log on information and below levels.
0x4: Decode and display AQL packets. |
-| AMD_LOG_MASK
Enable HIP log on different Level. | 0x7FFFFFFF | 0x1: Log API calls.
0x02: Kernel and Copy Commands and Barriers.
0x4: Synchronization and waiting for commands to finish.
0x8: Enable log on information and below levels.
0x20: Queue commands and queue contents.
0x40:Signal creation, allocation, pool.
0x80: Locks and thread-safety code.
0x100: Copy debug.
0x200: Detailed copy debug.
0x400: Resource allocation, performance-impacting events.
0x800: Initialization and shutdown.
0x1000: Misc debug, not yet classified.
0x2000: Show raw bytes of AQL packet.
0x4000: Show code creation debug.
0x8000: More detailed command info, including barrier commands.
0x10000: Log message location.
0xFFFFFFFF: Log always even mask flag is zero. |
-| HIP_VISIBLE_DEVICES
Only devices whose index is present in the sequence are visible to HIP. | | 0,1,2: Depending on the number of devices on the system. |
-| GPU_DUMP_CODE_OBJECT
Dump code object. | 0 | 0: Disable.
1: Enable. |
-| AMD_SERIALIZE_KERNEL
Serialize kernel enqueue. | 0 | 1: Wait for completion before enqueue.
2: Wait for completion after enqueue.
3: Both. |
-| AMD_SERIALIZE_COPY
Serialize copies. | 0 | 1: Wait for completion before enqueue.
2: Wait for completion after enqueue.
3: Both. |
-| HIP_HOST_COHERENT
Coherent memory in hipHostMalloc. | 0 | 0: memory is not coherent between host and GPU.
1: memory is coherent with host. |
-| AMD_DIRECT_DISPATCH
Enable direct kernel dispatch. | 1 | 0: Disable.
1: Enable. |
-| GPU_MAX_HW_QUEUES
The maximum number of hardware queues allocated per device. | 4 | The variable controls how many independent hardware queues HIP runtime can create per process, per device. If application allocates more HIP streams than this number, then HIP runtime will reuse the same hardware queues for the new streams in round robin manner. Please note, this maximum number does not apply to either hardware queues that are created for CU masked HIP streams, or cooperative queue for HIP Cooperative Groups (there is only one single queue per device). |
+| **Environment variable** | **Default value** | **Usage** |
+| ---------------------------------------- | ----------------- | ---------------------------------------- |
+| AMD_LOG_LEVEL
Enable HIP log on different Level. | 0 | 0: Disable log.
1: Enable log on error level.
2: Enable log on warning and below levels.
0x3: Enable log on information and below levels.
0x4: Decode and display AQL packets. |
+| AMD_LOG_MASK
Enable HIP log on different Level. | 0x7FFFFFFF | 0x1: Log API calls.
0x02: Kernel and Copy Commands and Barriers.
0x4: Synchronization and waiting for commands to finish.
0x8: Enable log on information and below levels.
0x20: Queue commands and queue contents.
0x40:Signal creation, allocation, pool.
0x80: Locks and thread-safety code.
0x100: Copy debug.
0x200: Detailed copy debug.
0x400: Resource allocation, performance-impacting events.
0x800: Initialization and shutdown.
0x1000: Misc debug, not yet classified.
0x2000: Show raw bytes of AQL packet.
0x4000: Show code creation debug.
0x8000: More detailed command info, including barrier commands.
0x10000: Log message location.
0xFFFFFFFF: Log always even mask flag is zero. |
+| HIP_VISIBLE_DEVICES
Only devices whose index is present in the sequence are visible to HIP. | | 0,1,2: Depending on the number of devices on the system. |
+| GPU_DUMP_CODE_OBJECT
Dump code object. | 0 | 0: Disable.
1: Enable. |
+| AMD_SERIALIZE_KERNEL
Serialize kernel enqueue. | 0 | 1: Wait for completion before enqueue.
2: Wait for completion after enqueue.
3: Both. |
+| AMD_SERIALIZE_COPY
Serialize copies. | 0 | 1: Wait for completion before enqueue.
2: Wait for completion after enqueue.
3: Both. |
+| HIP_HOST_COHERENT
Coherent memory in hipHostMalloc. | 0 | 0: memory is not coherent between host and GPU.
1: memory is coherent with host. |
+| AMD_DIRECT_DISPATCH
Enable direct kernel dispatch (Currently for Linux, under development on Windows. ). | 1 | 0: Disable.
1: Enable. |
+| GPU_MAX_HW_QUEUES
The maximum number of hardware queues allocated per device. | 4 | The variable controls how many independent hardware queues HIP runtime can create per process, per device. If application allocates more HIP streams than this number, then HIP runtime will reuse the same hardware queues for the new streams in round robin manner. Please note, this maximum number does not apply to either hardware queues that are created for CU masked HIP streams, or cooperative queue for HIP Cooperative Groups (there is only one single queue per device). |
## General Debugging Tips
- 'gdb --args' can be used to conveniently pass the executable and arguments to gdb.
-- From inside GDB, you can set environment variables "set env". Note the command does not use an '=' sign:
+- From inside GDB on Linux, you can set environment variables "set env". Note the command does not use an '=' sign:
```
(gdb) set env AMD_SERIALIZE_KERNEL 3
diff --git a/docs/markdown/hip_faq.md b/docs/markdown/hip_faq.md
index e4725190c0..f908e5d29b 100644
--- a/docs/markdown/hip_faq.md
+++ b/docs/markdown/hip_faq.md
@@ -20,6 +20,7 @@
- [Can I develop HIP code on an AMD HIP-Clang platform?](#can-i-develop-hip-code-on-an-amd-hip-clang-platform)
- [What is ROCclr?](#what-is-rocclr)
- [What is hipamd?](#what-is-hipamd)
+- [Can I get HIP open source repository for Windows?](#can-i-get-hip-open-source-repository-for-windows)
- [Can a HIP binary run on both AMD and Nvidia platforms?](#can-a-hip-binary-run-on-both-amd-and-nvidia-platforms)
- [On HIP-Clang, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang?](#on-HIP-Clang-can-i-link-hip-code-with-host-code-compiled-with-another-compiler-such-as-gcc-icc-or-clang-)
- [HIP detected my platform (hip-clang vs nvcc) incorrectly - what should I do?](#hip-detected-my-platform-hip-clang-vs-nvcc-incorrectly---what-should-i-do)
@@ -34,6 +35,7 @@
- [Does the HIP-Clang compiler support extern shared declarations?](#does-the-hip-clang-compiler-support-extern-shared-declarations)
- [I have multiple HIP enabled devices and I am getting an error message hipErrorNoBinaryForGpu: Unable to find code object for all current devices?](#i-have-multiple-hip-enabled-devices-and-i-am-getting-an-error-message-hipErrorNoBinaryForGpu-unable-to-find-code-object-for-all-current-devices)
- [How to use per-thread default stream in HIP?](#how-to-use-per-thread-default-stream-in-hip)
+- [Can I develop applications with HIP APIs on Windows the same on Linux?](#can-I-develop-applications-with-hip-apis-on-windows-the-same-on-linux)
- [How can I know the version of HIP?](#how-can-I-know-the-version-of-hip)
@@ -185,6 +187,9 @@ ROCclr (Radeon Open Compute Common Language Runtime) is a virtual device interfa
### What is HIPAMD?
HIPAMD is a repository branched out from HIP, mainly the implementation for AMD GPU.
+### Can I get HIP open source repository for Windows?
+No, there is no HIP repository open publicly on Windows.
+
### Can a HIP binary run on both AMD and Nvidia platforms?
HIP is a source-portable language that can be compiled to run on either AMD or NVIDIA platform. HIP tools don't create a "fat binary" that can run on either platform, however.
@@ -274,6 +279,11 @@ Once source is compiled with per-thread default stream enabled, all APIs will be
Besides, per-thread default stream be enabled per translation unit, users can compile some files with feature enabled and some with feature disabled. Feature enabled translation unit will have default stream as per thread and there will not be any implicit synchronization done but other modules will have legacy default stream which will do implicit synchronization.
+### Can I develop applications with HIP APIs on Windows the same on Linux?
+
+Yes, HIP APIs are available to use on both Linux and Windows.
+Due to different working mechanisms on operating systems like Windows vs Linux, HIP APIs call corresponding lower level backend runtime libraries and kernel drivers for the OS, in order to control the executions on GPU hardware accordingly. There might be a few differences on the related backend software and driver support, which might affect usage of HIP APIs. See OS support details in HIP API document.
+
### How can I know the version of HIP?
HIP version definition has been updated since ROCm 4.2 release as the following:
diff --git a/docs/markdown/hip_logging.md b/docs/markdown/hip_logging.md
index 94858167f2..bf3d86a0ba 100644
--- a/docs/markdown/hip_logging.md
+++ b/docs/markdown/hip_logging.md
@@ -79,7 +79,7 @@ ClPrint(amd::LOG_INFO, amd::LOG_INIT, "Initializing HSA stack.");
## HIP Logging Example:
-Below is an example to enable HIP logging and get logging information during execution of hipinfo,
+Below is an example to enable HIP logging and get logging information during execution of hipinfo on Linux,
```
user@user-test:~/hip/bin$ export AMD_LOG_LEVEL=4
@@ -107,15 +107,7 @@ clockRate: 1900 Mhz
memoryClockRate: 875 Mhz
memoryBusWidth: 0
clockInstructionRate: 1000 Mhz
-totalGlobalMem: 7.98 GB
-maxSharedMemoryPerMultiProcessor: 64.00 KB
-totalConstMem: 8573157376
-sharedMemPerBlock: 64.00 KB
-canMapHostMemory: 1
-regsPerBlock: 0
-warpSize: 32
-l2CacheSize: 0
-computeMode: 0
+...
maxThreadsPerBlock: 1024
maxThreadsDim.x: 1024
maxThreadsDim.y: 1024
@@ -128,23 +120,7 @@ minor: 12
concurrentKernels: 1
cooperativeLaunch: 0
cooperativeMultiDeviceLaunch: 0
-arch.hasGlobalInt32Atomics: 1
-arch.hasGlobalFloatAtomicExch: 1
-arch.hasSharedInt32Atomics: 1
-arch.hasSharedFloatAtomicExch: 1
-arch.hasFloatAtomicAdd: 1
-arch.hasGlobalInt64Atomics: 1
-arch.hasSharedInt64Atomics: 1
-arch.hasDoubles: 1
-arch.hasWarpVote: 1
-arch.hasWarpBallot: 1
-arch.hasWarpShuffle: 1
-arch.hasFunnelShift: 0
-arch.hasThreadFenceSystem: 1
-arch.hasSyncThreadsExt: 0
-arch.hasSurfaceFuncs: 0
-arch.has3dGrid: 1
-arch.hasDynamicParallelism: 0
+...
gcnArch: 1012
isIntegrated: 0
maxTexture1D: 65536
@@ -171,6 +147,56 @@ memInfo.total: 7.98 GB
memInfo.free: 7.98 GB (100%)
```
+On Windows, AMD_LOG_LEVEL can be set via environment variable from advanced system setting, or from Command prompt run as administrator, as shown below as an example, which shows some debug log information calling backend runtime on Windows.
+```
+C:\hip\bin>set AMD_LOG_LEVEL=4
+C:\hip\bin>hipinfo
+
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\comgrctx.cpp:33 : 605413686305 us: 29864: [tid:0x9298] Loading COMGR library.
+:4:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\platform\runtime.cpp:83 : 605413869411 us: 29864: [tid:0x9298] init
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_context.cpp:47 : 605413869502 us: 29864: [tid:0x9298] Direct Dispatch: 0
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:543 : 605413870553 us: 29864: [tid:0x9298] hipGetDeviceCount: Returned hipSuccess :
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:556 : 605413870631 us: 29864: [tid:0x9298] ←[32m hipSetDevice ( 0 ) ←[0m
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:561 : 605413870848 us: 29864: [tid:0x9298] hipSetDevice: Returned hipSuccess :
+--------------------------------------------------------------------------------
+device# 0
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device.cpp:346 : 605413871623 us: 29864: [tid:0x9298] ←[32m hipGetDeviceProperties ( 0000008AEBEFF8C8, 0 ) ←[0m
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device.cpp:348 : 605413871695 us: 29864: [tid:0x9298] hipGetDeviceProperties: Returned hipSuccess :
+Name: AMD Radeon(TM) Graphics
+pciBusID: 3
+pciDeviceID: 0
+pciDomainID: 0
+multiProcessorCount: 7
+maxThreadsPerMultiProcessor: 2560
+isMultiGpuBoard: 0
+clockRate: 1600 Mhz
+memoryClockRate: 1333 Mhz
+memoryBusWidth: 0
+totalGlobalMem: 12.06 GB
+totalConstMem: 2147483647
+sharedMemPerBlock: 64.00 KB
+...
+gcnArchName: gfx90c:xnack-
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:541 : 605413924779 us: 29864: [tid:0x9298] ←[32m hipGetDeviceCount ( 0000008AEBEFF8A4 ) ←[0m
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:543 : 605413925075 us: 29864: [tid:0x9298] hipGetDeviceCount: Returned hipSuccess :
+peers: :3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_peer.cpp:176 : 605413928643 us: 29864: [tid:0x9298] ←[32m hipDeviceCanAccessPeer ( 0000008AEBEFF890, 0, 0 ) ←[0m
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_peer.cpp:177 : 605413928743 us: 29864: [tid:0x9298] hipDeviceCanAccessPeer: Returned hipSuccess :
+
+non-peers: :3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_peer.cpp:176 : 605413930830 us: 29864: [tid:0x9298] ←[32m hipDeviceCanAccessPeer ( 0000008AEBEFF890, 0, 0 ) ←[0m
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_peer.cpp:177 : 605413930882 us: 29864: [tid:0x9298] hipDeviceCanAccessPeer: Returned hipSuccess :
+device#0
+...
+:4:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\pal\palmemory.cpp:430 : 605414517802 us: 29864: [tid:0x9298] Free-: 8000 bytes, VM[ 3007c8000, 3007d0000]
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\devprogram.cpp:2979: 605414517893 us: 29864: [tid:0x9298] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferToImage
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\devprogram.cpp:2979: 605414518259 us: 29864: [tid:0x9298] For Init/Fini: Kernel Name: __amd_rocclr_copyBuffer
+...
+:4:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\pal\palmemory.cpp:206 : 605414523422 us: 29864: [tid:0x9298] Alloc: 100000 bytes, ptr[00000003008D0000-00000003009D0000], obj[00000003007D0000-00000003047D0000]
+:4:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\pal\palmemory.cpp:206 : 605414523767 us: 29864: [tid:0x9298] Alloc: 100000 bytes, ptr[00000003009D0000-0000000300AD0000], obj[00000003007D0000-00000003047D0000]
+:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_memory.cpp:681 : 605414524092 us: 29864: [tid:0x9298] hipMemGetInfo: Returned hipSuccess :
+memInfo.total: 12.06 GB
+memInfo.free: 11.93 GB (99%)
+```
+
## HIP Logging Tips:
- HIP logging works for both release and debug version of HIP application.
diff --git a/docs/markdown/hip_programming_guide.md b/docs/markdown/hip_programming_guide.md
index 80b50b96e5..86bb647ae8 100644
--- a/docs/markdown/hip_programming_guide.md
+++ b/docs/markdown/hip_programming_guide.md
@@ -14,7 +14,8 @@ GPU can directly access the host memory over the CPU/GPU interconnect, without n
There are flags parameter which can specify options how to allocate the memory, for example,
hipHostMallocPortable, the memory is considered allocated by all contexts, not just the one on which the allocation is made.
hipHostMallocMapped, will map the allocation into the address space for the current device, and the device pointer can be obtained with the API hipHostGetDevicePointer().
-hipHostMallocNumaUser is the flag to allow host memory allocation to follow numa policy by user.
+hipHostMallocNumaUser is the flag to allow host memory allocation to follow Numa policy by user. Please note this flag is currently only applicable on Linux, under development on Windows.
+
All allocation flags are independent, and can be used in any combination without restriction, for instance, hipHostMalloc can be called with both hipHostMallocPortable and hipHostMallocMapped flags set. Both usage models described above use the same allocation flags, and the difference is in how the surrounding code uses the host memory.
See the hipHostMalloc API for more information.
@@ -25,9 +26,10 @@ Target of Numa policy is to select a CPU that is closest to each GPU.
Numa distance is the measurement of how far between GPU and CPU devices.
By default, each GPU selects a Numa CPU node that has the least Numa distance between them, that is, host memory will be automatically allocated closest on the memory pool of Numa node of the current GPU device. Using hipSetDevice API to a different GPU will still be able to access the host allocation, but can have longer Numa distance.
+Note, Numa policy is so far implemented on Linux, and under development on Windows.
### Managed memory allocation
-Managed memory, including the `__managed__` keyword, is supported in HIP combined host/device compilation.
+Managed memory, including the `__managed__` keyword, is supported in HIP combined host/device compilation, on Linux, not on Windows (under development).
Managed memory, via unified memory allocation, allows data be shared and accessible to both the CPU and GPU using a single pointer.
The allocation will be managed by AMD GPU driver using the linux HMM (Heterogeneous Memory Management) mechanism, the user can call managed memory API hipMallocManaged to allocate a large chuch of HMM memory, execute kernels on device and fetch data between the host and device as needed.
@@ -49,7 +51,9 @@ else {
}
```
Please note, the managed memory capability check may not be necessary, but if HMM is not supported, then managed malloc will fall back to using system memory and other managed memory API calls will have undefined behavior.
-For more details on managed memory APIs, please refer to the documentation HIP-API.pdf, and the application at (https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp) is a sample usage.
+Note, managed memory management is implemented on Linux, not supported on Windows yet.
+For more details on managed memory APIs, please refer to the documentation HIP-API.pdf.
+The application at (https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp) is a sample usage on Linux.
### HIP Stream Memory Operations
@@ -108,7 +112,8 @@ A stronger system-level fence can be specified when the event is created with hi
- HIP/ROCm also supports the ability to cache host memory in the GPU using the "Non-Coherent" host memory allocations. This can provide performance benefit, but care must be taken to use the correct synchronization.
## Direct Dispatch
-HIP runtime has Direct Dispatch enabled by default in ROCM 4.4. With this feature we move away from our conventional producer-consumer model where the runtime creates a worker thread(consumer) for each HIP Stream, and the host thread(producer) enqueues commands to a command queue(per stream).
+HIP runtime has Direct Dispatch enabled by default in ROCM 4.4 on Linux.
+With this feature we move away from our conventional producer-consumer model where the runtime creates a worker thread(consumer) for each HIP Stream, and the host thread(producer) enqueues commands to a command queue(per stream).
For Direct Dispatch, HIP runtime would directly enqueue a packet to the AQL queue (user mode queue on GPU) on the Dispatch API call from the application. That has shown to reduce the latency to launch the first wave on the idle GPU and total time of tiny dispatches synchronized with the host.
@@ -117,14 +122,16 @@ In addition, eliminating the threads in runtime has reduced the variance in the
This feature can be disabled via setting the following environment variable,
AMD_DIRECT_DISPATCH=0
+Note, Direct Dispatch is implemented on Linux. It is currently not supported on Windows.
+
## HIP Runtime Compilation
HIP now supports runtime compilation (hipRTC), the usage of which will provide the possibility of optimizations and performance improvement compared with other APIs via regular offline static compilation.
hipRTC APIs accept HIP source files in character string format as input parameters and create handles of programs by compiling the HIP source files without spawning separate processes.
-For more details on hipRTC APIs, refer to HIP-API.pdf in GitHub (https://github.com/RadeonOpenCompute/ROCm).
+For more details on hipRTC APIs, refer to HIP-API.pdf in GitHub (https://docs.amd.com/category/api_documentation).
-The link here(https://github.com/ROCm-Developer-Tools/HIP/blob/main/tests/src/hiprtc/saxpy.cpp) shows an example how to program HIP application using runtime compilation mechanism, and detail hipRTC programming guide is also available in Github (https://github.com/ROCm-Developer-Tools/HIP/blob/main/docs/markdown/hip_rtc.md).
+For Linux developers, the link here(https://github.com/ROCm-Developer-Tools/HIP/blob/main/tests/src/hiprtc/saxpy.cpp) shows an example how to program HIP application using runtime compilation mechanism, and detail hipRTC programming guide is also available in Github (https://github.com/ROCm-Developer-Tools/HIP/blob/main/docs/markdown/hip_rtc.md).
## HIP Graph
HIP graph is supported. For more details, refer to the HIP API Guide.