-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问模型怎么才能通过deepspeed进行多卡训练 #22
Comments
暂时还有点问题,我也在调试,会尽快更新 |
以下是修改后跑通deepspeed单机多卡的主要替换代码(替换 trainer=LoRATrainer 及之后的部分):
补充:
最后train.sh里的python改成deepspeed启动就可以了 |
不是,conf是deepspeed的配置,比如像下面这样
|
多卡训练会报错,untimeError: Expected all tensors to be on the same device, but found at least two devices,你有遇到吗 |
我遇到的时候这个报错是来自于model加载部分,也就是在这块代码之前model=xxxModel()那里,或许可以看一下model_device_map是不是正确的 |
如题
The text was updated successfully, but these errors were encountered: