xAI
Grok 0 vs Grok 1 vs Grok 1.5: AI Benchmark Comparison
AI company, xAI has released its new Grok 1.5 large language model (LLM) and it comes with improved reasoning performance and increased tokens. We can now compare Grok 0 vs Grok 1 and Grok 1.5 based on their AI benchmark test shared by xAI.
The test includes benchmarking on the following platforms:
- MMLU – Used to benchmark the AI model’s reasoning and comprehension capability in natural language. It also evaluates a model’s ability to process text data and identify key elements, draw logical inferences, and solve problems based on the language.
- MATH – It measures difficulty level and types of problems in solving mathematical problems.
- GSM8K – GSM8K refers to a standardized dataset containing 8,000-word problems related to middle school mathematics while focusing on geometric shapes.
- HumanEval – This benchmark is a suite of benchmarks used to evaluate different aspects of AI that resemble human cognitive abilities.
Comparison:
Grok-0 scored 65.7 percent (5-shot) on MMLU, 15.7 percent (4-shot) on MATH, 56.8 percent (8-shot) on GSM8k, and 39.7 percent (0-shot) on HumanEval.
Grok-1 scored 73 percent (5-shot) on MMLU, 23.9 percent (4-shot) on MATH, 62.9 percent (8-shot) on GSM8k, and 63.2 percent (0-shot) on HumanEval.
Grok-1.5 scored 81.3 percent (5-shot) on MMLU, 50.6 percent (4-shot) on MATH, 90 percent (8-shot) on GSM8k, and 74.1 percent (0-shot) on HumanEval.
Advancements:
Grok 0 was trained with 33 billion parameters and the recent release of Grok-1 open source code revealed that Grok-1 has 314 billion parameters, which is a massive improvement. However, the parameter count for Grok-1.5 remains unknown for the time being.
Grok-1.5 has improved its reasoning and compression capability by 8.3 percent compared to Grok-1 and 15.7 percent more than Grok-0. Its math solving has optimized by 26.7 percent more than Grok 1 and 34.9 percent more than Grok 0. The new model has 27.1 percent better GSM8K than version 1 and 33.2 percent better than version 0.
Grok will be released soon as early access for X social media subscribers and expand gradually.
(Source)