Skip to content
This repository has been archived by the owner on Dec 1, 2024. It is now read-only.

Support for MoE models (see Switch Tranformer, NLLB) #109

Open
fiqas opened this issue Apr 18, 2023 · 0 comments
Open

Support for MoE models (see Switch Tranformer, NLLB) #109

fiqas opened this issue Apr 18, 2023 · 0 comments

Comments

@fiqas
Copy link

fiqas commented Apr 18, 2023

Hi, have you guys considered adding a support for Mixture-of-Experts models?
They're usually quite hefty in terms of size and would be a great opportunity to have them offload parameters to CPU.

Examples:
Switch Transformers (https://huggingface.co/google/switch-base-256)
NLLB (https://github.com/facebookresearch/fairseq/tree/nllb/)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant