Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

****

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

📃Paper • 🤗Datasets • 🤗Model (coming soon)

Introduction

We present T1 with strong reasoning ability and superior performance on challenging mathematical reasoning benchmarks. T1 is trained by scaling RL by encouraging exploration and understand inference scaling. We first initialize the LLM using synthesized chain-of-thought data that integrates trial-and-error and self-verification. T1 with open LLMs as its base exhibits inference scaling behavior and achieves superior performance on challenging math reasoning benchmarks.

[2025/01/22] We have released the paper and SFT data. Model weights and RL training data will be released soon.

Figure 1: Training scaling and inference scaling of T1 on the AIME2024 dataset

Results

Our approach achieves competitive performance across challenging mathematical reasoning benchmarks:

Model	MATH500	AIME	Omni-MATH-500	GPQA
GPT-4o	76.6	9.3	26.8	53.6
Claude-3.5-sonnet	78.3	16.0	-	65.0
Llama-3.3-70B-Instruct	73.9	24.2	27.9	50.5
Qwen2.5-Math-7B-Instruct	82.7	16.7	29.7	36.9
o1-preview	85.5	44.6	-	72.3
QwQ-32B-preview	90.6	50.0	46.6	58.2

T1-SFT (GLM-4-9B)	60.2	4.1	20.0	37.2
T1 (GLM-4-9B)	65.8	9.2	24.4	38.1
T1-SFT (Qwen2.5-14B)	77.2	10.3	28.5	42.3
T1 (Qwen2.5-14B)	87.4	30.5	38.6	48.3
T1-SFT (Qwen2.5-32B)	83.4	24.9	34.6	49.5
T1 (Qwen2.5-32B)	92.4	50.6	49.6	56.1

📚 Citation

Coming soon!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Introduction

Results

📚 Citation

About

Releases

Packages

Contributors 2

License

THUDM/T1

Folders and files

Latest commit

History

Repository files navigation

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Introduction

Results

📚 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages