Skip to content

Latest commit

 

History

History
77 lines (62 loc) · 8.89 KB

navigation.md

File metadata and controls

77 lines (62 loc) · 8.89 KB

Dataset Navigation 🧭

Raw and Filtered Datasets

Model Name Dataset Type Description
Llama 3.3 70B Instruct Magpie-Llama-3.3-Pro-1M SFT 1M Raw conversations built with Meta Llama 3.3 70B.
Llama 3.3 70B Instruct Magpie-Llama-3.3-Pro-500K-Filtered SFT Apply a filter and select 500K high quality conversations.
Model Name Dataset Type Description
Llama 3.1 70B Instruct Magpie-Llama-3.1-Pro-1M SFT 1M Raw conversations built with Meta Llama 3.1 70B.
Llama 3.1 70B Instruct Magpie-Llama-3.1-Pro-300K-Filtered SFT Apply a filter and select 300K high quality conversations.
Llama 3.1 70B Instruct Magpie-Llama-3.1-Pro-500K-Filtered SFT Apply a filter and select 500K high quality conversations.
Llama 3.1 70B Instruct Magpie-Llama-3.1-Pro-MT-500K SFT Extend Magpie-Llama-3.1-Pro-500K-Filtered to multi-turn.
Llama 3.1 70B Instruct Magpie-Llama-3.1-Pro-MT-300K-Filtered SFT Select 300K high quality multi-turn conversations from Magpie-Llama-3.1-Pro-MT-500K.
Llama 3.1 70B Instruct Magpie-Llama-3.1-Pro-DPO-100K DPO DPO dataset via Best-of-N sampling and rewards.
Model Name Dataset Type Description
Llama 3 70B Instruct Magpie-Pro-1M SFT 1M Raw conversations built with Meta Llama 3 70B.
Llama 3 70B Instruct Magpie-Pro-300K-Filtered SFT Apply a filter and select 300K high quality conversations.
Llama 3 70B Instruct Magpie-Pro-MT-300K SFT Select 300K difficult questions and extend to multi-turn conversations.
Llama 3 70B Instruct Magpie-Pro-DPO-100K DPO DPO dataset via Best-of-N sampling and rewards.
Llama 3 8B Instruct Magpie-Air-3M SFT 3M Raw conversations built with Meta Llama 3 8B.
Llama 3 8B Instruct Magpie-Air-300K-Filtered SFT Apply a filter and select 300K high quality data.
Llama 3 8B Instruct Magpie-Air-MT-300K SFT Select 300K difficult questions and extend to multi-turn conversations.
Llama 3 8B Instruct Magpie-Air-DPO-100K DPO DPO dataset via Best-of-N sampling and rewards.

Qwen2.5

Model Name Dataset Type Description
Qwen2.5 72B Instruct Magpie-Qwen2.5-Pro-1M SFT 1M Raw conversations built with Qwen2.5 72B Instruct.
Qwen2.5 72B Instruct Magpie-Qwen2.5-Pro-300K-Filtered SFT Apply a filter and select 300K high quality conversations.
Model Name Dataset Type Description
Qwen2 72B Instruct Magpie-Qwen2-Pro-1M SFT 1M Raw conversations built with Qwen2 72B Instruct.
Qwen2 72B Instruct Magpie-Qwen2-Pro-300K-Filtered SFT Apply a filter and select 300K high quality conversations.
Qwen2 72B Instruct Magpie-Qwen2-Pro-200K-Chinese SFT Apply a filter and select 200K high quality Chinese conversations.
Qwen2 72B Instruct Magpie-Qwen2-Pro-200K-English SFT Apply a filter and select 200K high quality English conversations.
Qwen2 7B Instruct Magpie-Qwen2-Air-3M SFT 3M Raw conversations built with Qwen2 7B Instruct.
Qwen2 7B Instruct Magpie-Qwen2-Air-300K-Filtered SFT Apply a filter and select 300K high quality conversations.
Model Name Dataset Type Description
Phi-3 Medium Instruct Magpie-Phi3-Pro-1M SFT 1M Raw conversations built with Phi-3 Medium Instruct.
Phi-3 Medium Instruct Magpie-Phi3-Pro-300K-Filtered SFT Apply a filter and select 300K high quality conversations.
Model Name Dataset Type Description
Gemma-2-27b-it Magpie-Gemma2-Pro-534K SFT 534K conversations built with Gemma-2-27b-it.
Gemma-2-27b-it Magpie-Gemma2-Pro-200K-Filtered SFT Apply a filter and select 200K conversations.

Domain Datasets

Reasoning

Model Dataset Type Description
Qwen2-72B-Instruct + Llama-3-70B-Instruct Magpie-Reasoning-150K SFT 150K conversations built with Qwen2-72B-Instruct + Llama-3-70B-Instruct.
Llama3.1-70B-Instruct + Llama3.3-70B-Instruct Magpie-LlamaCoT-250K SFT 250K conversations built with Llama3.1-70B-Instruct + Llama3.3-70B-Instruct.

Coding & Debugging

Coming Soon.

Math

Coming Soon.