Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPP Exclusive-Or on HOST and HIP #464

Open
wants to merge 64 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
0a09a27
Update the initial CODE for HIP Implementation of Exclusive OR
Srihari-mcw Oct 21, 2024
4f5d6f0
Add exclusive_or.hpp hip file
Srihari-mcw Sep 9, 2024
885e808
Update the code for initial HOST Code
Srihari-mcw Sep 10, 2024
e130368
Make SSE based updatess for exclusive or
Srihari-mcw Sep 10, 2024
169c59c
Update the code for AVX2 implementation of U8 code
Srihari-mcw Sep 10, 2024
cce836d
Uncomment pragma
Srihari-mcw Sep 10, 2024
5b06d48
Initial commit for I8
Srihari-mcw Sep 11, 2024
7f5df55
Add I8 case
Srihari-mcw Sep 11, 2024
9fe0d11
Fix issues with PKD3 to PLN3 i8 implementation
Srihari-mcw Sep 17, 2024
a83f3fa
Initial updates based on self review
Srihari-mcw Sep 17, 2024
30bd007
More updates
Srihari-mcw Sep 17, 2024
c782fd2
More cleanup
Srihari-mcw Sep 20, 2024
5672802
Update separate code for PLN3 to PLN3 U8
Srihari-mcw Sep 24, 2024
d52e53d
Update separate code for PLN3 to PLN3 I8
Srihari-mcw Sep 24, 2024
10eddc3
Update separate code for PLN3 to PLN3 F32
Srihari-mcw Sep 24, 2024
8937164
Fix compilation issues
Srihari-mcw Sep 24, 2024
b52cef4
Fix accuracy issues for PLN3 to PLN3
Srihari-mcw Sep 24, 2024
f547030
Add comments and formatting
Srihari-mcw Sep 24, 2024
069165d
Rearrange the function declarations
Srihari-mcw Sep 24, 2024
a420d14
Add golden outputs for exclusive or
Srihari-mcw Sep 24, 2024
fa6100d
Add AVX2 flags wherever necessary
Srihari-mcw Sep 24, 2024
700c507
Update the code to have updated F16 load functions
Srihari-mcw Sep 24, 2024
2823e4b
HIP Code Updates
Srihari-mcw Sep 24, 2024
f0732b0
F16 PLN3 to PLN3 Updates
Srihari-mcw Sep 24, 2024
91311c3
Update outputs
Srihari-mcw Oct 21, 2024
13b315f
Rearrange XOR GPU function header
Srihari-mcw Sep 27, 2024
b348dc6
Add empty line
Srihari-mcw Sep 27, 2024
0e18365
Update aligned length
Srihari-mcw Sep 30, 2024
ad5036d
Updates to make F16 outputs consistent with other bit depths
Srihari-mcw Sep 30, 2024
a86e2b0
Add std::nearbyintf in exclusive or hip code
Srihari-mcw Oct 1, 2024
bb3a55a
Update the code to use predefined zero vectors
Srihari-mcw Oct 8, 2024
6f79652
Update to use existing rpp_load96_u8_avx instead of rpp_load96_u8pln3…
Srihari-mcw Oct 9, 2024
db5a2ac
Update the version
Srihari-mcw Oct 23, 2024
a9363ce
Update changelog
Srihari-mcw Oct 23, 2024
bdd306a
Merge branch 'ar/opt_bitwise_xor' into opt_exclusive_or_hip
r-abishek Oct 30, 2024
6c394f0
Merge pull request #338 from Srihari-mcw/opt_exclusive_or_hip
r-abishek Oct 30, 2024
19885c0
Merge branch 'develop' into ar/opt_bitwise_xor
r-abishek Nov 2, 2024
756ba4b
Merge branch 'develop' into ar/opt_bitwise_xor
kiritigowda Nov 5, 2024
1da35eb
Merge branch 'develop' into ar/opt_bitwise_xor
r-abishek Nov 7, 2024
22623a6
Merge branch 'develop' into ar/opt_bitwise_xor
kiritigowda Nov 8, 2024
43852eb
Merge branch 'develop' into ar/opt_bitwise_xor
r-abishek Nov 27, 2024
97653cd
Update CHANGELOG.md
r-abishek Nov 27, 2024
6cfadf8
Merge branch 'develop' into ar/opt_bitwise_xor
kiritigowda Nov 27, 2024
bd312ca
Merge branch 'develop' into ar/opt_bitwise_xor
r-abishek Nov 30, 2024
c981ad2
Merge branch 'develop' into ar/opt_bitwise_xor
Srihari-mcw Dec 6, 2024
0b2be7e
Updates to fix more merge conflicts
Srihari-mcw Dec 6, 2024
2d9f0f2
Update version to 1.9.10 including exclusive or
Srihari-mcw Dec 9, 2024
311f265
Merge pull request #365 from Srihari-mcw/opt_bitwise_xor_rebased
r-abishek Dec 9, 2024
8e5b2a7
Merge branch 'develop' into ar/opt_bitwise_xor
r-abishek Dec 9, 2024
e40d481
Merge branch 'develop' into ar/opt_bitwise_xor
Srihari-mcw Dec 11, 2024
333b811
Merge pull request #370 from Srihari-mcw/opt_bitwise_xor_rebased
r-abishek Dec 11, 2024
3bfb8b7
Merge branch 'develop' into ar/opt_bitwise_xor
Srihari-mcw Dec 13, 2024
40252a1
Remove duplicate definitions of functions
Srihari-mcw Dec 13, 2024
83df6c4
Merge branch 'develop' into ar/opt_bitwise_xor
Srihari-mcw Dec 16, 2024
0ecbc06
Merge branch 'develop' into ar/opt_bitwise_xor
Srihari-mcw Dec 17, 2024
90299a6
Merge branch 'develop' into ar/opt_bitwise_xor
Srihari-mcw Dec 23, 2024
1b616f3
Merge pull request #373 from Srihari-mcw/opt_bitwise_xor_rebased
r-abishek Dec 24, 2024
c7f766a
Merge branch 'develop' into ar/opt_bitwise_xor
kiritigowda Jan 6, 2025
3969611
Merge branch 'develop' into ar/opt_bitwise_xor
kiritigowda Jan 6, 2025
d0f20ac
Merge branch 'develop' into ar/opt_bitwise_xor
kiritigowda Jan 7, 2025
4c8a1f1
Merge branch 'develop' into ar/opt_bitwise_xor
r-abishek Jan 7, 2025
39bed49
Merge branch 'develop' into ar/opt_bitwise_xor
r-abishek Jan 8, 2025
420cdc8
Merge branch 'develop' into ar/opt_bitwise_xor
kiritigowda Jan 8, 2025
0d334f2
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/opt_bit…
r-abishek Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,16 @@

Full documentation for RPP is available at [https://rocm.docs.amd.com/projects/rpp/en/latest](https://rocm.docs.amd.com/projects/rpp/en/latest)

## (Unreleased) RPP 1.9.9

### Changed

* RPP Tensor Exclusive-Or support on HOST and HIP

## (Unreleased) RPP 1.9.4

### Changes
### Changed

* AMD Clang is now the default CXX and C compiler
* RPP Tensor Box Filter support on HOST

Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ endif()
set(CMAKE_CXX_STANDARD 17)

# RPP Version
set(VERSION "1.9.4")
set(VERSION "1.9.9")

# Set Project Version and Language
project(rpp VERSION ${VERSION} LANGUAGES CXX)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion include/rpp_version.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ extern "C" {
// NOTE: IMPORTANT: Match the version with CMakelists.txt version
#define RPP_VERSION_MAJOR 1
#define RPP_VERSION_MINOR 9
#define RPP_VERSION_PATCH 4
#define RPP_VERSION_PATCH 9
#ifdef __cplusplus
}
#endif
Expand Down
44 changes: 44 additions & 0 deletions include/rppt_tensor_logical_operations.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,50 @@ RppStatus rppt_bitwise_and_host(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr
RppStatus rppt_bitwise_and_gpu(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Exclusive OR computation on HOST backend for a NCHW/NHWC layout tensor
* \details This function computes exclusive OR of corresponding pixels for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.png Sample Input1
* \image html img150x150_2.png Sample Input2
* \image html logical_operations_exclusive_or_img150x150.png Sample Output
* \param [in] srcPtr1 source1 tensor in HOST memory
* \param [in] srcPtr2 source2 tensor in HOST memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HOST memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] roiTensorPtrSrc ROI data in HOST memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_exclusive_or_host(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Exclusive OR computation on HIP backend for a NCHW/NHWC layout tensor
* \details This function computes exclusive OR of corresponding pixels for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.png Sample Input1
* \image html img150x150_2.png Sample Input2
* \image html logical_operations_exclusive_or_img150x150.png Sample Output
* \param [in] srcPtr1 source1 tensor in HIP memory
* \param [in] srcPtr2 source2 tensor in HIP memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HIP memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] roiTensorPtrSrc ROI data in HIP memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_exclusive_or_gpu(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Bitwise OR computation on HOST backend for a NCHW/NHWC layout tensor
* \details This function computes bitwise OR of corresponding pixels for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
Expand Down
Loading