-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Neoverse V2 and Armv9 #79
Conversation
I'll follow with a PR referencing this on the archspec side and check whether tests passes. @fspiga Do you (or anybody from NVIDIA) want to double check this? |
I have staged changes for the similar type of enhancements in my local system and I have been testing things for few weeks. For GCC/GNU, re use of Re NVHPC, IIRC Re CLANG, I am not sure the It is worth to add Arm HPC Compiler in this mix. Support for V2 starts from version 23.04.1. We can have someone from Arm Ltd double-check this. In my working copy, anything that is below the reccomended versions for Grace goes to |
As far as I know, GCC only added support for
That would be great, thanks.
The documentation of both GCC and CLANG state that the default is 'scalable'. I could not very if this is included in the
According to the release notes, it was added to 23.04.0: https://developer.arm.com/documentation/107578/2304/?lang=en
I'd also expect that for any production workload, a recent compiler can be expected on Grace systems. But most other architectures specified here also add some tuning flags for older GCC versions as well. So in order to stay consistent, I took the 'neoverse_n1' specification as a guideline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fspiga Can you double check the few items that are on hold here? It would be good to have v2 support in the upcoming major release of Spack 🙂
cpu/microarchitectures.json
Outdated
}, | ||
{ | ||
"versions": "8.0:8.99", | ||
"flags" : "-march=armv8.4-a+sve -mtune=cortex-a72 -msve-vector-bits=128" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "-msve-vector-bits=128" should not be used in this entire PR. It is making vector-length specific (VLS) SVE which can only be guaranteed to run correctly if the platform has that SVE vector length. We need to produce binaries compiled on V2 that would not risk wrong behaviour on (eg) V1 where VL is 256 in SVE.
There is the warning in the GCC manual:
[..] At present ‘-msve-vector-bits=128’ also generates vector-length agnostic output for big-endian targets. All other values generate vector-length specific code. The behavior of these values may change in future releases and no value except ‘scalable’ should be relied on for producing code that is portable across different hardware SVE vector lengths.
Changes requested:
- Remove any reference to -msve-vector-length - it's not needed.
- For compilers with the V2 cost model (gcc 13, ACFL (arm) 23.04.0 and above) - replace whole line with "-mcpu=neoverse-v2". There is no need to specify "mtune" and "march" as "mcpu" sets both on aarch64 for Clang, ACfL, and GNU.
This PR for N1/V1 : #81 is the (simpler) pattern for compilers that do support a particular core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the argument for having comparability with V1. These architectures differ in their supported instruction sets, so the incompatibility goes beyond the sve vector width. But this flag is no longer included now due to the comments below.
* For compilers with the V2 cost model (gcc 13, ACFL (arm) 23.04.0 and above) - replace whole line with "-mcpu=neoverse-v2". There is no need to specify "mtune" and "march" as "mcpu" sets both on aarch64 for Clang, ACfL, and GNU.
The mcpu
flag is used whenever possible for the neoverse V2 architecture with all compilers already. Only in older compiler versions like the one you singled out, the maximum supported arm architecture flag subset is used. So I'm not sure I understand your comment here, since this is very similar to the N1/V1 specifications you are pointing to.
Hi all, (Simon - hi! I'm technology manager for the compiler teams at Arm).
Correct - this example https://godbolt.org/z/a7a3En6TT shows that in practice. Can I check: do we have a good reason for enabling width-specific flags here? If it's for a good reason (eg. we've explicitly written width-specific library code), then it's fine. If it's for performance reasons (eg. the compiler gets to assume another thing, so it should be faster, right?...), I'd be wary. We do almost all our optimization work on width-agnistic codegen. I'd go so far as to say: if you did have any examples where you saw a speed advantage with width-specific, please let me know, so we can put effort into fixing them. I recognise that, in this particular case, this framework knows it's only ever building for neoverse-v2, so it's probably a moot point. But note that we (Arm) are keen to proliferate vector-length-agnostic code wherever possible, to reduce the impact of future vector length incompatibility. Thanks! Will. |
@willlovett-arm @dslarm Can you suggest changesets to this PR (I assume they amount to removing |
Hi Will, thanks for providing feedback on this. I think the fact that your compiler optimization work targets width-agnostic codegen is a convincing argument for removing the fixed sve vector length. The intention was to provide as much optimization opportunities as possible for the compiler. One of our applications could benefit from fixed vector width, but that should not translate into (potentially) worse optimization in general. |
Thanks @AdhocMan and everyone involved in the discussion! |
Add support for Armv9 and Neoverse v2, as required for the Nvidia Grace CPU.