You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I am using transformers 4.34 and tiktoken 0.4.0. I am trying to download the tokenizer for CodeGen 2.5, but when I run the command in the tutorial
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-mono", trust_remote_code=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 738, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2045, in from_pretrained
return cls._from_pretrained(
File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 136, in __init__
super().__init__(
File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 366, in __init__
self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
File "/home/velocity/miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 462, in _add_tokens
current_vocab = self.get_vocab().copy()
File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 153, in get_vocab
vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 149, in vocab_size
return self.encoder.n_vocab
AttributeError: 'CodeGen25Tokenizer' object has no attribute 'encoder'
I tried to delete the cache but it doesn't seem to be working.. Running tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-mono") gives ValueError: Tokenizer class CodeGen25Tokenizer does not exist or is not currently imported.
So I wonder whether anyone else has encountered this issue, and if yes, how can I solve it, thank you so much!
The text was updated successfully, but these errors were encountered:
Hi! I am using transformers 4.34 and tiktoken 0.4.0. I am trying to download the tokenizer for CodeGen 2.5, but when I run the command in the tutorial
I tried to delete the cache but it doesn't seem to be working.. Running
tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-mono")
givesValueError: Tokenizer class CodeGen25Tokenizer does not exist or is not currently imported.
So I wonder whether anyone else has encountered this issue, and if yes, how can I solve it, thank you so much!
The text was updated successfully, but these errors were encountered: