Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Implement Indirect Function hsa_executable_symbol_t and Its Info Query #176

Closed
matinraayai opened this issue Jan 1, 2024 · 5 comments

Comments

@matinraayai
Copy link

In the ROCr Documentation, features related to creating an indirect function symbol and querying its information is not implemented.
Are there any plans to implement them? If not what are the issues that blocks this?

@t-tye
Copy link
Contributor

t-tye commented Jan 12, 2024

In the HSA HSAIL spec there was the addition of indirect functions. These were never implemented so this support was never used. One of the things likely needed for this is a standard ABI for heterogeneous function pointers. For numerous reasons AMD GPU does not have a standard function ABI yet.

Interested in what you are looking to do with this?

@TimourPaltashev
Copy link

H Matin, we need more detailed justification of potential usage for requested features.

@matinraayai
Copy link
Author

@t-tye @TimourPaltashev Thank you for the response. We want to use this in our binary instrumentation framework (called Luthier) for the following use cases:

  1. Internally, we use indirect functions heavily (e.g. indirect functions are the primary payload for instrumentation we need to keep track of, what possible indirect functions can be called from inside the kernel, etc).
    Since there's no support for indirect functions in HSA, we have a wrapper around HSA primitives that works around it, which does the following:
    a. It iterates over all symbols, and locates all the STT:FUNC symbols
    b. If any of those symbols have a KD symbol associated with them, then they're an indirect function symbol.

To locate a single (or all) symbols in an executable, we have to pay additional lookup penalties, to ensure we cover the indirect functions. Having HSA implementing this feature will remove this penalty.

  1. Externally, we want to let the tool users know the list of indirect functions that can potentially be called from the kernel that is about to get launched; But since ROCr doesn't have this implemented, we have to work around this issue by "emulating the hsa_executable_symbol_t ourselves", which is roughly the following:
  2. Have a record of indirect functions located in each hsa_executable_t internally, which will be populated as executables get queried during the life of the program.
  3. If we're returning an indirect function to the user, then the address of the symbol (possibly on the host or device, we haven't decided yet) will be their 64-bit handle value.
  4. If the user passes us an hsa_executable_t, we will have to see if that matches any of our internal records for indirect functions first, before passing it to HSA for usage (to avoid any invalid argument errors).

This workaround is doable on our end, but puts additional burden on Luthier, for something that seems more fit to be implemented in HSA.

I understand that, from the host side, there's little need for the runtime to know about the indirect function. But our use case relies heavily on them being identified as quickly as possible, and exposed to the end user.

@t-tye
Copy link
Contributor

t-tye commented Jan 15, 2024

Currently symbols are put into the dynsym table for kernel descriptors and the entry point of the kernel. The latter is not required in the symbol table as the language runtime only requires the kernel descriptors. A future compiler may stop putting the kernel entry point in the symbol table.

The HSA Runtime only returns the kernel descriptor symbols. There are no indirect functions being generated by the compiler.

The ABI for the kernel descriptor is specified at https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor . If you require the entry point of the kernel you can use the signed KERNEL_CODE_ENTRY_BYTE_OFFSET relative to the base address of the kernel descriptor.

Be aware that some targets support preloaded kernel arguments which results in there being 2 entry points to the kernel. See https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-arguments .

@matinraayai
Copy link
Author

I'm closing this in favor of #203.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants