8343689: AArch64: Optimize MulReduction implementation #225

mikabl-arm · 2025-01-14T17:21:02Z

Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used.

Benchmarks results for an AArch64 CPU with support for SVE with 256-bit vector length:

  Benchmark                 (size)   Mode      Old        New  Units
  Byte256Vector.MULLanes      1024  thrpt  502.498  10222.717 ops/ms
  Double256Vector.MULLanes    1024  thrpt  172.116   3130.997 ops/ms
  Float256Vector.MULLanes     1024  thrpt  291.612   4164.138 ops/ms
  Int256Vector.MULLanes       1024  thrpt  362.276   3717.213 ops/ms
  Long256Vector.MULLanes      1024  thrpt  184.826   2054.345 ops/ms
  Short256Vector.MULLanes     1024  thrpt  379.231   5716.223 ops/ms

Benchmarks results for an AArch64 CPU with support for SVE with 512-bit vector length:

  Benchmark                 (size)   Mode      Old       New   Units
  Byte512Vector.MULLanes      1024  thrpt  160.129  2630.600  ops/ms
  Double512Vector.MULLanes    1024  thrpt   51.229  1033.284  ops/ms
  Float512Vector.MULLanes     1024  thrpt   84.617  1658.400  ops/ms
  Int512Vector.MULLanes       1024  thrpt  109.419  1180.310  ops/ms
  Long512Vector.MULLanes      1024  thrpt   69.036   704.144  ops/ms
  Short512Vector.MULLanes     1024  thrpt  131.029  1629.632  ops/ms

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

JDK-8343689: AArch64: Optimize MulReduction implementation (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/panama-vector.git pull/225/head:pull/225
$ git checkout pull/225

Update a local copy of the PR:
$ git checkout pull/225
$ git pull https://git.openjdk.org/panama-vector.git pull/225/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 225

View PR using the GUI difftool:
$ git pr show -t 225

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/panama-vector/pull/225.diff

Using Webrev

Link to Webrev Comment

Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. Benchmarks results for an AArch64 CPU with support for SVE with 256-bit vector length: Benchmark (size) Mode Old New Units Byte256Vector.MULLanes 1024 thrpt 502.498 10222.717 ops/ms Double256Vector.MULLanes 1024 thrpt 172.116 3130.997 ops/ms Float256Vector.MULLanes 1024 thrpt 291.612 4164.138 ops/ms Int256Vector.MULLanes 1024 thrpt 362.276 3717.213 ops/ms Long256Vector.MULLanes 1024 thrpt 184.826 2054.345 ops/ms Short256Vector.MULLanes 1024 thrpt 379.231 5716.223 ops/ms Benchmarks results for an AArch64 CPU with support for SVE with 512-bit vector length: Benchmark (size) Mode Old New Units Byte512Vector.MULLanes 1024 thrpt 160.129 2630.600 ops/ms Double512Vector.MULLanes 1024 thrpt 51.229 1033.284 ops/ms Float512Vector.MULLanes 1024 thrpt 84.617 1658.400 ops/ms Int512Vector.MULLanes 1024 thrpt 109.419 1180.310 ops/ms Long512Vector.MULLanes 1024 thrpt 69.036 704.144 ops/ms Short512Vector.MULLanes 1024 thrpt 131.029 1629.632 ops/ms

bridgekeeper · 2025-01-14T17:21:18Z

👋 Welcome back mablakatov! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-01-14T17:22:03Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

mlbridge · 2025-01-14T17:26:40Z

Webrevs

00: Full (b419aa57)

PaulSandoz · 2025-01-14T17:27:27Z

@mikabl-arm you can create a PR with this change against https://github.com/openjdk/jdk. Since the Vector API is incubating in the jdk/master repo we prefer to target such changes as this to that repo.

The panama-vector repo is then used for larger more speculative changes, rather than accumulating smaller changes into a larger harder to review PR to jdk/master later on.

openjdk bot added the rfr label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8343689: AArch64: Optimize MulReduction implementation #225

8343689: AArch64: Optimize MulReduction implementation #225

mikabl-arm commented Jan 14, 2025 •

edited by openjdk bot

Loading

bridgekeeper bot commented Jan 14, 2025

openjdk bot commented Jan 14, 2025

mlbridge bot commented Jan 14, 2025

PaulSandoz commented Jan 14, 2025

8343689: AArch64: Optimize MulReduction implementation #225

Are you sure you want to change the base?

8343689: AArch64: Optimize MulReduction implementation #225

Conversation

mikabl-arm commented Jan 14, 2025 • edited by openjdk bot Loading

Progress

Issue

Reviewing

bridgekeeper bot commented Jan 14, 2025

openjdk bot commented Jan 14, 2025

mlbridge bot commented Jan 14, 2025

Webrevs

PaulSandoz commented Jan 14, 2025

mikabl-arm commented Jan 14, 2025 •

edited by openjdk bot

Loading