Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

Closed
iha-taisei opened this issue Nov 26, 2024 · 3 comments
Closed

Comments

@iha-taisei
Copy link
Contributor

I provided Pull Request #4803 for SVE-enablement of [SD]GEMV on A64FX, but there is still room for performance improvement.
Therefore, I'd like to propose another patch for such improvement of transposed [SD]GEMV on A64FX and Neoverse V1.

@Mousius
Copy link
Contributor

Mousius commented Nov 26, 2024

Hi @iha-taisei,

It's always good to keep adding new optimized kernels 😸

How would this be different from https://github.com/OpenMathLib/OpenBLAS/blob/develop/kernel/arm64/gemv_t_sve.c ?

@iha-taisei
Copy link
Contributor Author

Hi @Mousius,

As you see above, I did loop-unrolling too.

@martin-frbg
Copy link
Collaborator

closing as resolved/implemented by your #4996 - thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants