Do you plan to integrate this algorithm into the vllm project? #15

Alienfeel · 2024-10-21T07:02:29Z

RT

RyeYuan · 2024-11-19T06:50:35Z

I had tried to use the sageAttention in prefilling phase of vllm based on latest 2.0.0 branch, however, after replacing fa2, it doesn't seem to have much effect, and the end-to-end throughput performance (tokens/s) remains almost the same

jason-huang03 · 2024-11-20T13:30:34Z

We are discussing with sglang team, and there is possibility that sageattention will be used in sglang in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you plan to integrate this algorithm into the vllm project? #15

Do you plan to integrate this algorithm into the vllm project? #15

Alienfeel commented Oct 21, 2024

RyeYuan commented Nov 19, 2024

jason-huang03 commented Nov 20, 2024

Do you plan to integrate this algorithm into the vllm project? #15

Do you plan to integrate this algorithm into the vllm project? #15

Comments

Alienfeel commented Oct 21, 2024

RyeYuan commented Nov 19, 2024

jason-huang03 commented Nov 20, 2024