Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mi50 Support #29

Open
YehowshuaScaled opened this issue Dec 31, 2023 · 5 comments
Open

Mi50 Support #29

YehowshuaScaled opened this issue Dec 31, 2023 · 5 comments

Comments

@YehowshuaScaled
Copy link

I was able to build flash-attention ROCM for both my Mi100 and Mi50 cards, but only got flash attention working on the Mi100(very impressive performance I might add).

Trying to run flash attention on the Mi50 delivered the following error:
RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> does not support this problem

How hard would it be to port FA to the Mi50. Happy to pay/hire for support on this as I have a rather large stockpile of Mi50s.

@dejay-vu
Copy link

dejay-vu commented Jan 23, 2024

Hi @YehowshuaScaled. I think it would be better to ask the CK team to see if they are going to support MI50. It won't be an issue if they have FA kernels running on MI50.

#RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> does not support this problem

This error is actually raised from the CK backend.

@differentprogramming
Copy link

I notice this line in setup.py
allowed_archs = ["native", "gfx90a", "gfx908", "gfx940", "gfx941", "gfx942"]
I'm sad that gfx906 isn't there since I have an MI50 as well.

@linchen111
Copy link

I was able to build flash-attention ROCM for both my Mi100 and Mi50 cards, but only got flash attention working on the Mi100(very impressive performance I might add).我能够为我的 Mi100 和 Mi50 卡构建 flash-attention ROCM,但只在 Mi100 上实现 flash-attention(我可能会补充非常令人印象深刻的性能)。

Trying to run flash attention on the Mi50 delivered the following error:尝试在 Mi50 上运行 Flash Attention 时出现以下错误: RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> does not support this problem运行时错误:DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256 , 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft>不支持这个问题

How hard would it be to port FA to the Mi50. Happy to pay/hire for support on this as I have a rather large stockpile of Mi50s.将 FA 移植到 Mi50 上有多难?很高兴支付/雇用这方面的支持,因为我有相当大的 Mi50 库存。

did you solve this?

@Said-Akbar
Copy link

Hello @YehowshuaScaled ,

Did you find a solution for it? I have 2x MI60 cards.

@Said-Akbar
Copy link

Said-Akbar commented Oct 31, 2024

Hi @jayz0123 ,

How hard is it to implement FA kernels for MI60? Can you please point to the relevant scripts and documentation to make the changes? What knowledge is required to implement FA2 to MI60? Is it only dependent on Composable Kernel repo support?
Quick look at CK repo shows they support Mi60 (gfx906): https://github.com/search?q=repo%3AROCm%2Fcomposable_kernel%20gfx906&type=code
However, official doc states they only support gfx908 and up - https://rocm.docs.amd.com/projects/composable_kernel/en/latest/tutorial/tutorial_hello_world.html#hardware-targets
I want to explore this and make changes myself if it is not very complex (and does not require architecture knowledge)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants