Alibaba Cloud has unveiled a new series of language models called Qwen2-Math.
These models are specifically optimized for mathematical tasks, and they’ve been designed to outperform even the most advanced general-purpose language models like GPT-4 and Claude.
The Qwen2-Math and Qwen2-Math-Instruct models come in a range of sizes, from 1.5 to 72 billion parameters.
They’re based on Alibaba’s existing Qwen2 language models, but with additional pre-training on a specialized math corpus. This corpus includes high-quality mathematical web texts, books, code, exam questions, and math-focused data generated by the Qwen2 models themselves.
In benchmarks such as GSM8K, Math, and MMLU-STEM, the largest model, Qwen2-Math-72B-Instruct, has outperformed GPT-4, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama-3.1-405B.
The models have also achieved top scores in Chinese math benchmarks like CMATH, GaoKao Math Cloze, and GaoKao Math QA.
While Alibaba reports that the Qwen2-Math models can solve simpler math competition problems, the company emphasizes that they “do not guarantee the correctness of the claims in the process.”
To ensure the integrity of the test results, the Qwen team has thoroughly cleaned up the datasets before and after training to avoid any overlap between the training and test data.
Try Qwen2
Use desktop mode for full Screen.
The Qwen2-Math models are currently available under the Tongyi Qianwen license on Hugging Face, with a commercial license required for more than 100 million users per month.
Alibaba plans to release bilingual models supporting both English and Chinese in the near future, with multilingual models in development.
Alibaba’s push for more powerful and specialized AI models reflects the growing demand for advanced logical and mathematical capabilities in artificial intelligence.
While it remains to be seen whether training language models solely on math problems will lead to fundamental improvements in logical reasoning, the Qwen2-Math series represents an intriguing step forward in the quest for more capable and versatile AI systems.