-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BrokenPipeError: [Errno 32] Broken pipe #40
Comments
pip uninstall wandb if you don't use it |
I need use it,but i met the question which is 'BrokenPipeError: [Errno 32] Broken pipe', I set the num_workers =0 or 1,the problem still exists.My environmennt is all set up. |
@shuxueslpi Thanks you ,I had solved the issue .and I would like to ask you about your environment, speed and data amount, because my speed is relatively slow in the environment specified in the document, 40s/it, and now it is slightly better to 15s/it, I don't know whether it is normal. |
I wonder how you solve it. The wandb BrokenPipeError has been tortured me the whole day. Help me if you can |
Solve the issue by downgrade wandb to 0.13.1 |
Thanks you ,the problem I had solved by uninstall wandb. |
Thanks you ,My english is senior,excuse me.
when i run the CUDA_VISIBLE_DEVICES=1 python3 train_qlora.py --train_args_json chatGLM_6B_QLoRA.json --model_name_or_path /T106/chatGLM-6B-QLoRA-main/chatGLM-6B-QLoRA-main/remote_scripts/chatglm-6b/ --train_data_path data/train.jsonl --eval_data_path data/dev.jsonl --lora_rank 4 --lora_dropout 0.05 --compute_dtype fp32.
I get the error :
(base) root@461jc47ml0du4-0:/T106/chatGLM-6B-QLoRA-main/chatGLM-6B-QLoRA-main# CUDA_VISIBLE_DEVICES=1 python3 train_qlora.py --train_args_json chatGLM_6B_QLoRA.json --model_name_or_path /T106/chatGLM-6B-QLoRA-main/chatGLM-6B-QLoRA-main/remote_scripts/chatglm-6b/ --train_data_path data/train.jsonl --eval_data_path data/dev.jsonl --lora_rank 4 --lora_dropout 0.05 --compute_dtype fp32
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:11<00:00, 1.38s/it]
trainable params: 1,835,008 || all params: 6,175,121,408 || trainable%: 0.029716144489446126
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-7be044e55537389d/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 425.26it/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-7be044e55537389d/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-b1ad1cf49d010a09.arrow
Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/json/default-7be044e55537389d/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-7f22050519838b48.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-7be044e55537389d/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-82c52662d9060a3a.arrow
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-15aa3e3de12fc81f/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1132.07it/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-15aa3e3de12fc81f/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-d76e708a4953afce.arrow
Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/json/default-15aa3e3de12fc81f/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-466cf4ff38bae650.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-15aa3e3de12fc81f/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-74e21cf8bae50992.arrow
wandb: Currently logged in as: 2315553823 (fky_hbj). Use
wandb login --relogin
to force reloginwandb: Tracking run with wandb version 0.15.10
wandb: Run data is saved locally in /T106/chatGLM-6B-QLoRA-main/chatGLM-6B-QLoRA-main/wandb/run-20230920_070236-anybjcsm
wandb: Run
wandb offline
to turn off syncing.wandb: Syncing run kind-sunset-17
wandb: ⭐️ View project at https://wandb.ai/fky_hbj/huggingface
wandb: 🚀 View run at https://wandb.ai/fky_hbj/huggingface/runs/anybjcsm
0%| | 0/3581 [00:00<?, ?it/s]
use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
...0%| | 2/3581 [00:25<12:28:26, 12.55s/it]Exception in thread NetStatThr:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 267, in check_network_status
self._loop_check_status(
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 223, in _loop_check_status
Exception in thread IntMsgThr:
local_handle = request()
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 735, in deliver_network_status
File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
return self._deliver_network_status(status)
self.run()
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 466, in _deliver_network_status
File "/opt/conda/lib/python3.8/threading.py", line 870, in run
return self._deliver_record(record)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 425, in _deliver_record
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 299, in check_internal_messages
handle = mailbox._deliver_record(record, interface=self)
self._loop_check_status(
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 223, in _loop_check_status
local_handle = request()
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 743, in deliver_internal_messages
return self._deliver_internal_messages(internal_message)
interface._publish(record)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 472, in _deliver_internal_messages
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
return self._deliver_record(record)
self.send_server_request(server_req)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 425, in _deliver_record
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
handle = mailbox._deliver_record(record, interface=self)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
self._send_message(msg)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
interface._publish(record)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
self._sock_client.send_record_publish(record)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
3%|███▍ | 100/3581 [18:37<10:43:05, 11.08s/it]Traceback (most recent call last):
File "train_qlora.py", line 206, in
train(args)
File "train_qlora.py", line 200, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1927, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 2240, in _maybe_log_save_evaluate
self.log(logs)
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 2595, in log
self.control = self.callback_handler.on_log(self.args, self.state, self.control, logs)
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer_callback.py", line 399, in on_log
return self.call_event("on_log", args, state, control, logs=logs)
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer_callback.py", line 406, in call_event
result = getattr(callback, event)(
File "/opt/conda/lib/python3.8/site-packages/transformers/integrations/integration_utils.py", line 803, in on_log
self._wandb.log({**logs, "train/global_step": state.global_step})
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 419, in wrapper
return func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 370, in wrapper_fn
return func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 360, in wrapper
return func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1792, in log
self._log(data=data, step=step, commit=commit)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1567, in _log
self._partial_history_callback(data, step, commit)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1439, in _partial_history_callback
self._backend.interface.publish_partial_history(
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 546, in publish_partial_history
self._publish_partial_history(partial_history)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 89, in _publish_partial_history
self._publish(rec)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
wandb: While tearing down the service manager. The following error has occurred: [Errno 32] Broken pipe
I wonder if that's the problem of the wandb
The text was updated successfully, but these errors were encountered: