when i run "bash scripts/perfect.sh", I got this error and please help me to solve this~ #4

emmayouyou · 2022-08-24T10:06:03Z

Traceback (most recent call last):
File "run_clm.py", line 517, in
main()
File "run_clm.py", line 427, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/transformers/trainer.py", line 1340, in train
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/transformers/trainer.py", line 1445, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 133, in evaluate
output = self.eval_loop(
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 185, in eval_loop
metrics = self.compute_pet_metrics(eval_datasets, model, self.extra_info[metric_key_prefix])
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 210, in compute_pet_metrics
centroids = self._compute_per_token_train_centroids(model)
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 281, in _compute_per_token_train_centroids
data = get_label_samples(self.train_dataset, label)
File "/Users/01119378/Documents/2022/perfect-main/fewshot/third_party/trainers/trainer.py", line 278, in get_label_samples
return dataset.filter(lambda example: int(example['labels']) == label)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 470, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/fingerprint.py", line 406, in wrapper
out = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2519, in filter
indices = self.map(
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2036, in map
return self._map_single(
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 503, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 470, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/fingerprint.py", line 406, in wrapper
out = func(self, *args, **kwargs)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2248, in _map_single
return Dataset.from_file(cache_file_name, info=info, split=self.split)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 654, in from_file
return cls(
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 593, in init
self.info.features = self.info.features.reorder_fields_as(inferred_features)
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/features/features.py", line 1092, in reorder_fields_as
return Features(recursive_reorder(self, other))
File "/Applications/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/features/features.py", line 1081, in recursive_reorder
raise ValueError(f"Keys mismatch: between {source} and {target}" + stack_position)
ValueError: Keys mismatch: between {'indices': Value(dtype='uint64', id=None)} and {'candidates_ids': Sequence(feature=Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None), length=-1, id=None), 'labels': Value(dtype='int64', id=None), 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None), 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None), 'extra_fields': {}}
0%| | 2/6000 [08:36<430:29:39, 258.38s/it]

vkgo · 2022-09-04T14:16:56Z

Did you solve this problem? I got the same error as you when training 200/6000.

saga1214 · 2022-10-17T02:41:24Z

Any progress so far? I got the same issue at 200/6000 and seems irrelevant to tasks (I've tried rte, sst2) and models (roberta-large, roberta-base)

rabeeh-karimi · 2022-11-14T03:37:23Z

Hi All, Hi @vkgo @saga1214 @emmayouyou ,

this is because of a bug in huggingface datasets library, please see huggingface/datasets#2943 they must have updated something meanwhile in their library.

To solve it, I uninstall datasets and re-install it, now this is version 2.6.1 and I do not seem to get an error.

 pip install datasets==2.6.1

Rabeeh

e0397123 · 2022-12-05T07:55:54Z

Hi, after changing the version of the datasets to 2.6.1, I encountered the below error when I tried to run the code on QQP or QNLI, the below error doesn't occur for datasets version 0.13.0, but if I use 0.13.0, the abovementioned error is raised. Any way to work around this?

Traceback (most recent call last):
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2961, in _map_single
writer.write(example)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 467, in write
self.write_examples_on_file()
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 425, in write_examples_on_file
self.write_batch(batch_examples=batch_examples)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 527, in write_batch
pa_table = pa.Table.from_arrays(arrays, schema=schema)
File "pyarrow/table.pxi", line 3592, in pyarrow.lib.Table.from_arrays
File "pyarrow/table.pxi", line 2785, in pyarrow.lib.Table.validate
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 4 named extra_fields expected length 1000 but got length 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_clm.py", line 517, in
main()
File "run_clm.py", line 377, in main
predict_dataset = predict_dataset.map(
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2572, in map
return self._map_single(
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 584, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 551, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/fingerprint.py", line 480, in wrapper
out = func(self, *args, **kwargs)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2991, in _map_single
writer.finalize()
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 554, in finalize
self.write_examples_on_file()
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 425, in write_examples_on_file
self.write_batch(batch_examples=batch_examples)
File "/home/chen/anaconda3/envs/perfect/lib/python3.8/site-packages/datasets/arrow_writer.py", line 527, in write_batch
pa_table = pa.Table.from_arrays(arrays, schema=schema)
File "pyarrow/table.pxi", line 3592, in pyarrow.lib.Table.from_arrays
File "pyarrow/table.pxi", line 2785, in pyarrow.lib.Table.validate
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 4 named extra_fields expected length 1000 but got length 0

rabeeh-karimi · 2022-12-05T08:24:52Z

Hi
If you remove script_version="master" in this line

perfect/fewshot/data/tasks.py

Line 148 in a3dc00d

return load_dataset('glue', self.task, script_version="master")

and update to the lastest version of datasets which I think it is 2.7.1 now, do you still get it?

e0397123 · 2022-12-05T08:30:30Z

Hi If you remove script_version="master" in this line

perfect/fewshot/data/tasks.py

Line 148 in a3dc00d

return load_dataset('glue', self.task, script_version="master")

and update to the lastest version of datasets which I think it is 2.7.1 now, do you still get it?

yes, the error still occured. I notice that the error happens to QNLI and QQP when tokenizing the predict dataset. For MRPC, there is no such error.

rabeeh-karimi mentioned this issue Nov 14, 2022

Error in loading superglue #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when i run "bash scripts/perfect.sh", I got this error and please help me to solve this~ #4

when i run "bash scripts/perfect.sh", I got this error and please help me to solve this~ #4

emmayouyou commented Aug 24, 2022

vkgo commented Sep 4, 2022

saga1214 commented Oct 17, 2022

rabeeh-karimi commented Nov 14, 2022 •

edited

Loading

e0397123 commented Dec 5, 2022

rabeeh-karimi commented Dec 5, 2022 •

edited

Loading

e0397123 commented Dec 5, 2022

when i run "bash scripts/perfect.sh", I got this error and please help me to solve this~ #4

when i run "bash scripts/perfect.sh", I got this error and please help me to solve this~ #4

Comments

emmayouyou commented Aug 24, 2022

vkgo commented Sep 4, 2022

saga1214 commented Oct 17, 2022

rabeeh-karimi commented Nov 14, 2022 • edited Loading

e0397123 commented Dec 5, 2022

rabeeh-karimi commented Dec 5, 2022 • edited Loading

e0397123 commented Dec 5, 2022

rabeeh-karimi commented Nov 14, 2022 •

edited

Loading

rabeeh-karimi commented Dec 5, 2022 •

edited

Loading