-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChatGLM可进行QLoRA微调,但ChatGLM2会报显存OOM #29
Comments
可以试一下用更大显存的卡吗?我记得2080TI是11G的,我自己测试在3060-12G,3090-24G上都是OK的,而且训练的时候,batch到8都没有占满内存 |
4090 24G显存也OOM |
@xslower 确认是最新版本的模型代码吗?还有就是数据集里的数据都是多长的?batchsize有多大? |
加载模型阶段就凉了。没到跑batch。我直接加载-int4模型会报权重无法计算梯度的错。加载原模型就直接暴显存。最新代码。 D:\Env\Python39\python.exe E:\code\gpt\chatGLM-6B-QLoRA\train_qlora.py |
@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗? |
专门看了下,都>=你列的那些包。例bitsandybytes用的是0.40.2。怀疑是bytsandybytes的问题。正常6B模型的32位版,本身就得>24G显存,如果在加载模型过程中不进行量化,本来就得爆。正常推断的时候,要么加载int4版本,要么使用half()版。 |
我用 |
2023-09-11 09:47:27.394959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT ===================================BUG REPORT=================================== python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issuesbin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so |
这段里没有任何报错,都是warning……或者你没有复制全,没有复制到error部分 |
GPU硬件 4张2080Ti,单卡显存12G,指定单卡运行。
执行chatglm-6b微调
CUDA_VISIBLE_DEVICES=0 python train_qlora.py \ --train_args_json chatGLM_6B_QLoRA.json \ --model_name_or_path /data/chatglm-6b \ --train_data_path data/train.jsonl \ --eval_data_path data/eval.jsonl \ --lora_rank 4 \ --lora_dropout 0.05 \ --compute_dtype fp32
正常运行,并在./saved_files目录保存结果。
但执行chatglm2-6b微调(确认chatglm2-6b文件是最新版本)
CUDA_VISIBLE_DEVICES=0 python train_qlora.py \ --train_args_json chatGLM_6B_QLoRA.json \ --model_name_or_path /data/chatglm2-6b \ --train_data_path data/train.jsonl \ --eval_data_path data/eval.jsonl \ --lora_rank 4 \ --lora_dropout 0.05 \ --compute_dtype fp32
会报错,
`===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
device_map='auto'
/home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/softwares/anaconda3/envs/langchain did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug.
The model weights are not tied. Please use the
tie_weights
method before using theinfer_auto_device
function.╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/data/FT_LLM/chatGLM-6B-QLoRA/train_qlora.py:214 in │
│ │
│ 211 │
│ 212 if name == "main": │
│ 213 │ args = parse_args() │
│ ❱ 214 │ train(args) │
│ 215 │
│ 216 │
│ │
│ /data/data/FT_LLM/chatGLM-6B-QLoRA/train_qlora.py:153 in train │
│ │
│ 150 │ # "output_layer": "cpu", │
│ 151 │ # } │
│ 152 │ │
│ ❱ 153 │ model = AutoModel.from_pretrained(global_args.model_name_or_path, │
│ 154 │ │ │ │ │ │ │ │ │ quantization_config=q_config, │
│ 155 │ │ │ │ │ │ │ │ │ device_map='auto', │
│ 156 │ │ │ │ │ │ │ │ │ trust_remote_code=True) │
│ │
│ /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/transformers/models/ │
│ auto/auto_factory.py:488 in from_pretrained │
│ │
│ 485 │ │ │ │ model_class.register_for_auto_class(cls.name) │
│ 486 │ │ │ else: │
│ 487 │ │ │ │ cls.register(config.class, model_class, exist_ok=True) │
│ ❱ 488 │ │ │ return model_class.from_pretrained( │
│ 489 │ │ │ │ pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, │
│ 490 │ │ │ ) │
│ 491 │ │ elif type(config) in cls.model_mapping.keys(): │
│ │
│ /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/transformers/modelin │
│ g_utils.py:2842 in from_pretrained │
│ │
│ 2839 │ │ │ │ │ key: device_map[key] for key in device_map.keys() if key not in modu │
│ 2840 │ │ │ │ } │
│ 2841 │ │ │ │ if "cpu" in device_map_without_lm_head.values() or "disk" in device_map │
│ ❱ 2842 │ │ │ │ │ raise ValueError( │
│ 2843 │ │ │ │ │ │ """ │
│ 2844 │ │ │ │ │ │ Some modules are dispatched on the CPU or the disk. Make sure yo │
│ 2845 │ │ │ │ │ │ the quantized model. If you want to dispatch the model on the CP │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set
load_in_8bit_fp32_cpu_offload=True
and pass a customdevice_map
tofrom_pretrained
. Checkhttps://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
注释掉train_qlora.py中的
model = AutoModel.from_pretrained(global_args.model_name_or_path,quantization_config=q_config,
device_map='auto',
trust_remote_code=True)
会报OOM错误
===================================BUG REPORT===================================Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/softwares/anaconda3/envs/langchain did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug.
Loading checkpoint shards: 71%|██████████████████████████████████████████████████████████████▊ | 5/7 [00:14<00:05, 2.91s/it]
OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB (GPU 0; 10.75 GiB total capacity; 10.08 GiB already allocated; 142.50 MiB free;
10.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See
documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`
请问chatglm2有对应的修改方案吗?
The text was updated successfully, but these errors were encountered: