Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFace中开源的代码似乎没有实现矩阵合并 #80

Open
meteorlin opened this issue Aug 6, 2024 · 1 comment
Open

HuggingFace中开源的代码似乎没有实现矩阵合并 #80

meteorlin opened this issue Aug 6, 2024 · 1 comment

Comments

@meteorlin
Copy link

作者好!我看了您在HuggingFace上开源的代码,其中的注意力部分似乎没有实现论文中提到的Q、K映射矩阵合并(absorbed),想请教下这块内容具体是在哪进行了等效实现?

@ZxAndJb
Copy link

ZxAndJb commented Aug 25, 2024

I have the same question here. In the open-source implementation on huggingface, k still has multiple heads, and k,v still be saved during inference, which is completely different from the statements in the architecture part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants