Token translation for language models
- Documentation: https://ipieter.github.io/transtokenizer
- GitHub: https://github.com/ipieter/transtokenizer
- PyPI: https://pypi.org/project/trans-tokenizers/
- Licence: MIT
- TODO
from transtokenizers import transform_model
from transformers import AutoTokenizer, AutoModelForCausalLM
source_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
target_tokenizer = AutoTokenizer.from_pretrained("pdelobelle/robbert-2023-dutch-base")
source_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
target_model = transform_model(source_model, source_tokenizer=source_tokenizer, target_tokenizer=target_tokenizer)