You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the same question here. In the open-source implementation on huggingface, k still has multiple heads, and k,v still be saved during inference, which is completely different from the statements in the architecture part.
作者好!我看了您在HuggingFace上开源的代码,其中的注意力部分似乎没有实现论文中提到的Q、K映射矩阵合并(absorbed),想请教下这块内容具体是在哪进行了等效实现?
The text was updated successfully, but these errors were encountered: