How do Large Language Models Handle Multilingualism? @Nikita_Okhotnikov |
2024 |
Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing |
arxiv preprint |
- |
Authors define "language specific neurons" that dramatically affect the performance on a single language and finetune these on little training corpus gaining noticeable performance uplift |
Do Llamas Work in English? On the Latent Language of Multilingual Transformers @Anastasia Voznyuk |
2024 |
Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West |
arxiv preprint |
GitHub |
Authors claim that models operate in English and only after 15th layer transition to the target language. Entropy, at first high, to the final layers, decreases |
Emerging cross-lingual structure in pretrained language models@Anastasia Voznyuk |
2024 |
Alexis Conneau, Shijie Wu, Haoran Li, Luke Zettlemoyer, and Veselin Stoyanov |
ACL |
- |
This study examines multilingual masked language modeling and explores factors behind its effectiveness for cross-lingual transfer. It finds that transfer is possible even without shared vocabulary or similar text domains, as long as top-layer parameters are shared. Additionally, monolingual BERT representations across languages can be aligned post-hoc, suggesting universal symmetries in embedding spaces, which are discovered and aligned during joint training in multilingual models. |
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models @Andrei Semenov |
2024 |
Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Wayne Xin Zhao, Furu Wei, Ji-Rong Wen |
arxiv |
GitHub |
There exists lanugage-specific neurons, responsible for generating putput in particular language. Consequently, we can affect the quality of the multilingual output, by activating and deactivating these neurons |
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs @Andrei Semenov |
2024 |
Weixuan Wang, Barry Haddow, Wei Peng, Alexandra Birch |
arxiv |
GitHub |
Study about how neuron activation is shared across languages by categorizing neurons into four types: all-shared, partial-shared, specific, and non-activated. Task type affects linguistic sharing patterns, neuron behavior varies across inputs, and all-shared neurons are crucial for correct responses. Increasing all-shared neurons improves accuracy on multilingual tasks. |
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models @Andrei Semenov |
2024 |
Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang G. Cai |
arxiv |
GitHub |
By measuring activation differences across minimal pairs, this study quantifies linguistic similarity in LLMs. Experiments with 100+ LLMs and 150k minimal pairs in three languages reveal that: 1) training data influences linguistic similarity, with higher agreement in high-resource languages, 2) similarity aligns with fine-grained linguistic categories but not broader ones, 3) it is weakly correlated with semantic similarity, showing context dependency, and 4) LLMs show limited cross-lingual alignment in understanding linguistic phenomena. |
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners @Andrei Semenov |
2024 |
Shimao Zhang, Changjiang Gao, Wenhao Zhu, Jiajun Chen, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang |
arxiv |
GitHub |
Most LLMs show unbalanced performance across languages, but translation-based multilingual alignment is effective. This study explores the spontaneous improvement in multilingual alignment when LLMs are instruction-tuned on question translation data (without annotated answers). This boosts alignment between English and many languages, even those not seen during tuning. |
Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs @Nikita_Okhotnikov |
2024 |
Maxim Ifergan, Leshem Choshen, Roee Aharoni, Idan Szpektor, Omri Abend |
arxiv preprint |
- |
LLM factual knowledge are inconsistent across languages. The methodology to measure knowledge representations sharing across languages proposed. Script similarity -- dominant factor in representation sharing. Multiligual sharing has a potential to increase performance in the best-performing language. |