Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

iha-taisei · 2024-11-26T02:59:39Z

I provided Pull Request #4803 for SVE-enablement of [SD]GEMV on A64FX, but there is still room for performance improvement.
Therefore, I'd like to propose another patch for such improvement of transposed [SD]GEMV on A64FX and Neoverse V1.

Mousius · 2024-11-26T12:02:49Z

Hi @iha-taisei,

It's always good to keep adding new optimized kernels 😸

How would this be different from https://github.com/OpenMathLib/OpenBLAS/blob/develop/kernel/arm64/gemv_t_sve.c ?

iha-taisei · 2024-12-02T10:47:45Z

Hi @Mousius,

As you see above, I did loop-unrolling too.

martin-frbg · 2024-12-05T21:50:47Z

closing as resolved/implemented by your #4996 - thank you very much

iha-taisei mentioned this issue Dec 2, 2024

Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1 #4996

Merged

martin-frbg closed this as completed Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

iha-taisei commented Nov 26, 2024

Mousius commented Nov 26, 2024

iha-taisei commented Dec 2, 2024

martin-frbg commented Dec 5, 2024

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

Comments

iha-taisei commented Nov 26, 2024

Mousius commented Nov 26, 2024

iha-taisei commented Dec 2, 2024

martin-frbg commented Dec 5, 2024