-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Pipeline Parallelization of Different Stages in RLHF #877
base: main
Are you sure you want to change the base?
Conversation
选项1:引用 OpenRLHF 声明技术版权来源于 OpenRLHF |
我无法判断你说的信息的真实性(抱歉上次的讨论就 AI Lab 就严重说谎无法让人信服),一个和上海 AI Lab 没关系的开发者基于一个已经 Closed的 MR 去开发 Pipeline 优化。这说起来有点可疑,谁会无缘无故给上海 AI Lab 贡献这么大的 MR 还把 Closed 的 MR 挖出来用?假设就如上述所说是基于个人兴趣,#736 也涉嫌严重的抄袭问题,不适合用这个 MR 打包进入 XTuner,除非严格引用并且在 README.md 中说明 基于 Ray 和 vLLM 的 RLHF 技术方案来源于 OpenRLHF。 |
|
Motivation
The RLHF process can be divided into three stages: Generation, Forward, and Train. In the Generation stage, responses are generated using the vLLM. During the Forward stage, the actor, critic, reference, and reward models perform inference. In the Train stage, the actor and critic models undergo training.
During the execution of each stage, the GPUs for the other stages remain idle, leading to resource wastage.
To address this issue, we can optimize the process by leveraging the concept of pipeline parallelism. The batch data is divided into multiple smaller micro-batches. After processing a micro-batch in one stage, the data is immediately passed to the next stage for processing, rather than waiting for the entire batch to be completed. This approach reduces the idle time of GPUs in each stage, thereby improving resource utilization.
Modification
The code has been modified based on PR #736. The entrypoint file is
xtuner/rlhf/pipeline.py
.