Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when i run "bash scripts/perfect.sh", I got this error and please help me to solve this~ #4

Open
emmayouyou opened this issue Aug 24, 2022 · 6 comments

Comments

@emmayouyou
Copy link

Traceback (most recent call last):
File "run_clm.py", line 517, in
main()
File "run_clm.py", line 427, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/transformers/trainer.py", line 1340, in train
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/transformers/trainer.py", line 1445, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 133, in evaluate
output = self.eval_loop(
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 185, in eval_loop
metrics = self.compute_pet_metrics(eval_datasets, model, self.extra_info[metric_key_prefix])
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 210, in compute_pet_metrics
centroids = self._compute_per_token_train_centroids(model)
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 281, in _compute_per_token_train_centroids
data = get_label_samples(self.train_dataset, label)
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 278, in get_label_samples
return dataset.filter(lambda example: int(example['labels']) == label)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 470, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/fingerprint.py", line 406, in wrapper
out = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2519, in filter
indices = self.map(
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2036, in map
return self._map_single(
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 503, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 470, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/fingerprint.py", line 406, in wrapper
out = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2248, in _map_single
return Dataset.from_file(cache_file_name, info=info, split=self.split)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 654, in from_file
return cls(
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 593, in init
self.info.features = self.info.features.reorder_fields_as(inferred_features)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/features/features.py", line 1092, in reorder_fields_as
return Features(recursive_reorder(self, other))
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/features/features.py", line 1081, in recursive_reorder
raise ValueError(f"Keys mismatch: between {source} and {target}" + stack_position)
ValueError: Keys mismatch: between {'indices': Value(dtype='uint64', id=None)} and {'candidates_ids': Sequence(feature=Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None), length=-1, id=None), 'labels': Value(dtype='int64', id=None), 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None), 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None), 'extra_fields': {}}
0%| | 2/6000 [08:36<430:29:39, 258.38s/it]

@vkgo
Copy link

vkgo commented Sep 4, 2022

Did you solve this problem? I got the same error as you when training 200/6000.

@saga1214
Copy link

Any progress so far? I got the same issue at 200/6000 and seems irrelevant to tasks (I've tried rte, sst2) and models (roberta-large, roberta-base)

@rabeeh-karimi
Copy link

rabeeh-karimi commented Nov 14, 2022

Hi All, Hi @vkgo @saga1214 @emmayouyou ,

this is because of a bug in huggingface datasets library, please see huggingface/datasets#2943 they must have updated something meanwhile in their library.

To solve it, I uninstall datasets and re-install it, now this is version 2.6.1 and I do not seem to get an error.

 pip install datasets==2.6.1

Rabeeh

@e0397123
Copy link

e0397123 commented Dec 5, 2022

Hi, after changing the version of the datasets to 2.6.1, I encountered the below error when I tried to run the code on QQP or QNLI, the below error doesn't occur for datasets version 0.13.0, but if I use 0.13.0, the abovementioned error is raised. Any way to work around this?

Traceback (most recent call last):
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2961, in _map_single
writer.write(example)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 467, in write
self.write_examples_on_file()
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 425, in write_examples_on_file
self.write_batch(batch_examples=batch_examples)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 527, in write_batch
pa_table = pa.Table.from_arrays(arrays, schema=schema)
File "pyarrow/table.pxi", line 3592, in pyarrow.lib.Table.from_arrays
File "pyarrow/table.pxi", line 2785, in pyarrow.lib.Table.validate
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 4 named extra_fields expected length 1000 but got length 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_clm.py", line 517, in
main()
File "run_clm.py", line 377, in main
predict_dataset = predict_dataset.map(
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2572, in map
return self._map_single(
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 584, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 551, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/fingerprint.py", line 480, in wrapper
out = func(self, *args, **kwargs)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2991, in _map_single
writer.finalize()
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 554, in finalize
self.write_examples_on_file()
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 425, in write_examples_on_file
self.write_batch(batch_examples=batch_examples)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 527, in write_batch
pa_table = pa.Table.from_arrays(arrays, schema=schema)
File "pyarrow/table.pxi", line 3592, in pyarrow.lib.Table.from_arrays
File "pyarrow/table.pxi", line 2785, in pyarrow.lib.Table.validate
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 4 named extra_fields expected length 1000 but got length 0

@rabeeh-karimi
Copy link

rabeeh-karimi commented Dec 5, 2022

Hi
If you remove script_version="master" in this line

return load_dataset('glue', self.task, script_version="master")
and update to the lastest version of datasets which I think it is 2.7.1 now, do you still get it?

@e0397123
Copy link

e0397123 commented Dec 5, 2022

Hi If you remove script_version="master" in this line

return load_dataset('glue', self.task, script_version="master")

and update to the lastest version of datasets which I think it is 2.7.1 now, do you still get it?


yes, the error still occured. I notice that the error happens to QNLI and QQP when tokenizing the predict dataset. For MRPC, there is no such error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants