Optimize CPU and Memory performance for Resize linear mode parser #3731

coxuamd · 2024-12-22T13:10:42Z

Re-write calc_neighbor_points() by composing index from binary bits instead of recursion.

With the optimized calc_neighbor_points(), CPU time required by 90% and peak memory utilization is significantly reduced.

Perf. comparision on VM w/ 12-Core EPYC 9V64 + 128 GB mem:

n_dim	out_elements	New t-CPU (us)	Old t-CPU (us)	t-CPU Ratio
4	786432	120405	1494350	0.0806
4	1572864	282763	3826060	0.0739
4	3145728	650957	7941436	0.0820
4	6291456	1304652	14869059	0.0877
4	12582912	2608523	29432326	0.0886
4	25165824	5175560	58848631	0.0879
4	50331648	10486676	118005802	0.0889
4	100663296	21141464	OOM Kill	N/A

src/onnx/parse_resize.cpp

TedThemistokleous

Thank you for your contribution!

I have left comments for the tidy errors and concerns I have with the changes.

Please also run and ensure all our testcases are functional via building migraphx with make check when you build.

Some of the Onnx verify tests are failing with your changes for resize. These need to be working to ensure no lose of functionality between old and new methods.

[==========] 286 tests ran
[  FAILED  ] 2 tests failed
[  FAILED  ] resize_upsample_linear_ac_test
[  FAILED  ] resize_upsample_linear_test

The two files that seem to break are found at

test/onnx/verify/resize_upsample_linear_test.cpp
test/onnx/verify/resize_upsample_linear_ac_test.cpp

Also please ensure your changes meet format as outlined from here

https://github.com/ROCm/AMDMIGraphX/actions/runs/12454557312/job/34765947690?pr=3731

lakhinderwalia · 2024-12-23T15:47:21Z

@TedThemistokleous, Why is this PR showing up on rocm:develop, while others do on develop, please?

TedThemistokleous · 2024-12-23T16:02:07Z

@TedThemistokleous, Why is this PR showing up on rocm:develop, while others do on develop, please?

External contributor I think? I believe its should be the same repo the commit is going to.

codecov · 2024-12-24T01:51:38Z

Codecov Report

Attention: Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.15%. Comparing base (6d02806) to head (772862d).

Files with missing lines	Patch %	Lines
src/onnx/parse_resize.cpp	91.66%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3731      +/-   ##
===========================================
- Coverage    92.16%   92.15%   -0.01%     
===========================================
  Files          515      515              
  Lines        21978    21977       -1     
===========================================
- Hits         20256    20254       -2     
- Misses        1722     1723       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lakhinderwalia

Thank you for taking a stab at cleaning this function. However, this PR needs two additional things (besides all test-cases to pass):

Code comments for calc_neighbor_points. And this is not the fault of this PR, but the comments were/are entirely missing, yet this function is very complicated for a reviewer's understanding, and I am not sure what exactly is it doing. And that documentation needs to be fixed now, since this function is being rewritten.
A unit-test to specifically test this extremely complex function. It should have been there much earlier, but it can be added now in this PR.

src/onnx/parse_resize.cpp

coxuamd · 2024-12-24T06:52:39Z

Thank you for taking a stab at cleaning this function. However, this PR needs two additional things (besides all test-cases to pass):

Code comments for calc_neighbor_points. And this is not the fault of this PR, but the comments were/are entirely missing, yet this function is very complicated for a reviewer's understanding, and I am not sure what exactly is it doing. And that documentation needs to be fixed now, since this function is being rewritten.

A unit-test to specifically test this extremely complex function. It should have been there much earlier, but it can be added now in this PR.

Here's the explanation of the new algorithm:
Given a std::vector<std::vector<std::vectorstd::size_t>> vvv_ind() in below layout:
vvv_ind{} = {
{{}, {}},
{{}, {}},
{{}, {}},
{{}, {}},
}

1-dim is n_dims, example n_dims = 4;
2-dim is hard-coded to 2, in the original algorithm, ref. as lo/hi
3-dim is a vector of integer with out_elements.

What the original calc_neighbor_points() algorithm is trying to do, is to compose a new vector, with size (2^n_dims * out_elements), within each is a n_dim vector of integer. In current example, if out_elements=16, n_dims=4, then we'll have a new vector as below:

vec_ind{} = {
{}, # Each element is a vector of 4 integer
{},
{},
..
{}, # 2^4 * 16 vectors in total
}

Let's re-write vvv_ind in different pattern for friendly understanding, where each character is a vector of 16 elements, like A = {0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1}.

AB
CD
EF
GH

The original calc_neighbor_points() is done in a recursive way that append elements in vertical, and expand in horizontal. Each 16-element vector will be transposed, notated as A':
A' = {
{0},
{1},
{1},
{1},
{0},
{1},
{1},
{1},
{0},
{1},
{1},
{1},
{0},
{1},
{1},
{1},
}

Recursion:
Pass 0:
A'
B'

Pass 1:
A'C'
B'C'
A'D'
B'D'

Pass 2:
A'C'E'
B'C'E'
A'D'E'
B'D'E'
A'C'F'
B'C'F'
A'D'F'
B'D'F'

Pass 3:
A'C'E'G'
B'C'E'G'
A'D'E'G'
B'D'E'G'
...
B'D'F'G'

Pass 4, last time, with the crafted vector, 2^4 * 16 elements, each element is a vector of 4 integer as the index, to get the index from shape in_s.index(idx)

Since the 2nd dimension is hardcoded to 2, we can treat this dimension as binary. What the final result (before in_s.index(idx)) is actaully we increase from 0 to 2^n_dim, and convert this n-dim bits value to binary, using the bit to index into 2nd-dim of vvv_ind (hi or low), use the position in n-dim to index into 1st-dim of vvv_ind, and loop out_elements so that all elements in A (and other capital character) can be looped.

Taking Pass 3 (before the final in_s.index(idx)) to explain:

vec_ind{}
0-1
AB
CD
EF
GH

Pass 3:
outer loop on 2^n_dims w/ start:
A'C'E'G' >> 0000b >> start(0) convert to 4-bit binary, using each bit to index to vec_ind
-- middle loop on out_elements, so that we process all out_elements(16) in same way.
---- reset bits shift from start, so on each element, shift can cover all bits.
------ inner loop: Pick each bit to index to vec_ind
B'C'E'G' >> 1000b >> start(1), ditto
A'D'E'G' >> 0100b >> start(2), ditto
B'D'E'G' >> 1100b >> start(3), ditto
...
B'D'F'G' >> 1111b

src/onnx/parse_resize.cpp

pfultz2 · 2025-01-03T18:38:19Z

src/onnx/parse_resize.cpp

-                           dim.push_back(i);
-                           return dim;
-                       });
+        throw std::runtime_error("Shape dimension " + std::to_string(n_bits) + " exceeds " +


Use the MIGRAPHX_THROW macro to throw the exception. Also, prefix it with the onnx operator name(usually they make it all uppercase like RESIZE: )

Will fix in next push.

pfultz2 · 2025-01-03T18:41:16Z

src/onnx/parse_resize.cpp

-                           return dim;
-                       });
+        std::bitset<std::numeric_limits<std::size_t>::digits> bits_val = val;
+        std::vector<std::size_t> indices(n_bits);


Is there a commit missing? This should be std::array<std::size_t, std::numeric_limits<std::size_t>::digits> indices;.

Maybe I missed you point. Are you suggesting an array of fixed size std::numeric_limitsstd::size_t::digits instead of vector? Actually indices doesn't have to take 64/32 length long, n_bits is enough.

@pfultz2 need your comment, I'd like to make these changes in one-shot.

@pfultz2 Not update this part. Need your comments.

src/onnx/parse_resize.cpp

Re-write calc_neighbor_points() by composing index from binary bits instead of recursion. With the optimized calc_neighbor_points(), CPU time required by 90% and peak memory utilization is significantly reduced. Perf. comparision on VM w/ 12-Core EPYC 9V64 + 128 GB mem: n_dim out_elements New t-CPU (us) Old t-CPU (us) t-CPU Ratio ------- -------------- ---------------- ---------------- ------------- 4 786,432 170,377 1,878,299 0.0907 4 1,572,864 383,125 4,009,335 0.0956 4 3,145,728 784,388 7,670,960 0.1023 4 6,291,456 1,567,753 15,095,017 0.1039 4 12,582,912 3,139,452 29,622,921 0.1060 4 25,165,824 6,266,153 58,332,233 0.1074 4 50,331,648 12,517,674 116,923,368 0.1071 4 100,663,296 25,011,425 OOM Kill N/A Signed-off-by: Colin Xu <[email protected]>

Revise based on reviewer comments. Signed-off-by: Colin Xu <[email protected]>

Revise implemenation based on reviewer comments. Update performance comparison accordingly. +-------+--------------+----------------+----------------+-------------+ | n_dim | out_elements | New t-CPU (us) | Old t-CPU (us) | t-CPU Ratio | +-------+--------------+----------------+----------------+-------------+ | 4 | 786432 | 120405 | 1494350 | 0.0806 | | 4 | 1572864 | 282763 | 3826060 | 0.0739 | | 4 | 3145728 | 650957 | 7941436 | 0.0820 | | 4 | 6291456 | 1304652 | 14869059 | 0.0877 | | 4 | 12582912 | 2608523 | 29432326 | 0.0886 | | 4 | 25165824 | 5175560 | 58848631 | 0.0879 | | 4 | 50331648 | 10486676 | 118005802 | 0.0889 | | 4 | 100663296 | 21141464 | OOM Kill | N/A | +-------+--------------+----------------+----------------+-------------+ Signed-off-by: Colin Xu <[email protected]>

Revise based on reviewer comments. Rebase to develop HEAD. Signed-off-by: Colin Xu <[email protected]>

coxuamd requested a review from causten as a code owner December 22, 2024 13:10

TedThemistokleous self-requested a review December 22, 2024 15:20

TedThemistokleous added the Perf Improve label Dec 22, 2024

TedThemistokleous reviewed Dec 22, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

TedThemistokleous reviewed Dec 22, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

TedThemistokleous reviewed Dec 22, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

TedThemistokleous reviewed Dec 22, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

TedThemistokleous requested changes Dec 22, 2024

View reviewed changes

TedThemistokleous added the high priority A PR with high priority for review and merging. label Dec 22, 2024

TedThemistokleous assigned coxuamd Dec 22, 2024

TedThemistokleous requested a review from lakhinderwalia December 22, 2024 16:01

coxuamd force-pushed the resize_parse_opt branch from 0e22cd0 to a300e32 Compare December 24, 2024 01:42

coxuamd force-pushed the resize_parse_opt branch from a300e32 to 006eae7 Compare December 24, 2024 03:37

lakhinderwalia requested changes Dec 24, 2024

View reviewed changes

TedThemistokleous reviewed Dec 24, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

coxuamd requested review from lakhinderwalia and TedThemistokleous December 26, 2024 05:25

pfultz2 reviewed Dec 26, 2024

View reviewed changes

src/onnx/parse_resize.cpp Show resolved Hide resolved

pfultz2 reviewed Dec 26, 2024

View reviewed changes

src/onnx/parse_resize.cpp Show resolved Hide resolved

pfultz2 reviewed Dec 26, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Dec 26, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Dec 26, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Dec 26, 2024

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

coxuamd requested a review from pfultz2 January 3, 2025 07:48

coxuamd force-pushed the resize_parse_opt branch 2 times, most recently from cc2d6a0 to 92ef252 Compare January 3, 2025 13:51

pfultz2 reviewed Jan 3, 2025

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Jan 3, 2025

View reviewed changes

src/onnx/parse_resize.cpp Outdated Show resolved Hide resolved

coxuamd added 4 commits January 10, 2025 09:52

Optimize CPU and Memory performance for Resize linear mode parser

0304bfe

Revise based on reviewer comments. Signed-off-by: Colin Xu <[email protected]>

Optimize CPU and Memory performance for Resize linear mode parser

772862d

Revise based on reviewer comments. Rebase to develop HEAD. Signed-off-by: Colin Xu <[email protected]>

coxuamd force-pushed the resize_parse_opt branch from 92ef252 to 772862d Compare January 10, 2025 02:41

coxuamd requested a review from pfultz2 January 10, 2025 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize CPU and Memory performance for Resize linear mode parser #3731

Optimize CPU and Memory performance for Resize linear mode parser #3731

coxuamd commented Dec 22, 2024 •

edited

Loading

TedThemistokleous left a comment •

edited

Loading

lakhinderwalia commented Dec 23, 2024

TedThemistokleous commented Dec 23, 2024

codecov bot commented Dec 24, 2024 •

edited

Loading

lakhinderwalia left a comment

coxuamd commented Dec 24, 2024 •

edited

Loading

pfultz2 Jan 3, 2025

coxuamd Jan 6, 2025

pfultz2 Jan 3, 2025

coxuamd Jan 6, 2025 •

edited

Loading

coxuamd Jan 8, 2025

coxuamd Jan 10, 2025

Optimize CPU and Memory performance for Resize linear mode parser #3731

Are you sure you want to change the base?

Optimize CPU and Memory performance for Resize linear mode parser #3731

Conversation

coxuamd commented Dec 22, 2024 • edited Loading

TedThemistokleous left a comment • edited Loading

Choose a reason for hiding this comment

lakhinderwalia commented Dec 23, 2024

TedThemistokleous commented Dec 23, 2024

codecov bot commented Dec 24, 2024 • edited Loading

Codecov Report

lakhinderwalia left a comment

Choose a reason for hiding this comment

coxuamd commented Dec 24, 2024 • edited Loading

pfultz2 Jan 3, 2025

Choose a reason for hiding this comment

coxuamd Jan 6, 2025

Choose a reason for hiding this comment

pfultz2 Jan 3, 2025

Choose a reason for hiding this comment

coxuamd Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

coxuamd Jan 8, 2025

Choose a reason for hiding this comment

coxuamd Jan 10, 2025

Choose a reason for hiding this comment

coxuamd commented Dec 22, 2024 •

edited

Loading

TedThemistokleous left a comment •

edited

Loading

codecov bot commented Dec 24, 2024 •

edited

Loading

coxuamd commented Dec 24, 2024 •

edited

Loading

coxuamd Jan 6, 2025 •

edited

Loading