Skip to content
/ T1 Public

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

License

Notifications You must be signed in to change notification settings

THUDM/T1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

****

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

📃Paper🤗Datasets🤗Model (coming soon)

Introduction

We present T1 with strong reasoning ability and superior performance on challenging mathematical reasoning benchmarks. T1 is trained by scaling RL by encouraging exploration and understand inference scaling. We first initialize the LLM using synthesized chain-of-thought data that integrates trial-and-error and self-verification. T1 with open LLMs as its base exhibits inference scaling behavior and achieves superior performance on challenging math reasoning benchmarks.

  • [2025/01/22] We have released the paper and SFT data. Model weights and RL training data will be released soon.

Figure 1: Training scaling and inference scaling of T1 on the AIME2024 dataset

Results

Our approach achieves competitive performance across challenging mathematical reasoning benchmarks:

Model MATH500 AIME Omni-MATH-500 GPQA
GPT-4o 76.6 9.3 26.8 53.6
Claude-3.5-sonnet 78.3 16.0 - 65.0
Llama-3.3-70B-Instruct 73.9 24.2 27.9 50.5
Qwen2.5-Math-7B-Instruct 82.7 16.7 29.7 36.9
o1-preview 85.5 44.6 - 72.3
QwQ-32B-preview 90.6 50.0 46.6 58.2
T1-SFT (GLM-4-9B) 60.2 4.1 20.0 37.2
T1 (GLM-4-9B) 65.8 9.2 24.4 38.1
T1-SFT (Qwen2.5-14B) 77.2 10.3 28.5 42.3
T1 (Qwen2.5-14B) 87.4 30.5 38.6 48.3
T1-SFT (Qwen2.5-32B) 83.4 24.9 34.6 49.5
T1 (Qwen2.5-32B) 92.4 50.6 49.6 56.1

📚 Citation

Coming soon!

About

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published