Skip to content
Longterm Wiki
Back

Welcome | xAI — Creators of Grok, the AI Chatbot

web

Grok 3's release is relevant to AI safety researchers tracking frontier capability jumps, compute scaling trends, and the competitive dynamics among major AI labs that influence deployment norms and safety practices.

Metadata

Importance: 35/100press releasenews

Summary

Official announcement page for Grok 3, xAI's latest large language model. The page introduces Grok 3's capabilities and positions it as a frontier AI system developed by Elon Musk's AI company xAI, competing with leading models from OpenAI, Anthropic, and Google.

Key Points

  • Grok 3 is xAI's most advanced language model, trained on the Colossus supercluster with significantly more compute than predecessors.
  • The model claims state-of-the-art performance on reasoning and coding benchmarks.
  • xAI positions Grok 3 as a direct competitor to GPT-4o, Claude, and Gemini in the frontier AI landscape.
  • Release marks xAI's rapid capability scaling since its founding in 2023, relevant to tracking AI progress timelines.
  • Availability through the Grok chatbot and API signals xAI's move into enterprise and developer markets.

3 FactBase facts citing this source

EntityPropertyValueAs Of
xAIBenchmark Score1,4022025
xAIBenchmark Score93.32025
xAIModel Parameters2.7 trillion2025

Cached Content Preview

HTTP 200Fetched Apr 7, 202616 KB
Grok 3 Beta — The Age of Reasoning Agents | xAI February 19, 2025 Grok 3 Beta — The Age of Reasoning Agents

 We are thrilled to unveil an early preview of Grok 3, our most advanced model yet, blending superior reasoning with extensive pretraining knowledge.

 Next-Generation Intelligence from xAI 

 We are pleased to introduce Grok 3, our most advanced model yet: blending strong reasoning with extensive pretraining knowledge.
Trained on our Colossus supercluster with 10x the compute of previous state-of-the-art models, Grok 3 displays significant improvements in reasoning, mathematics, coding, world knowledge, and instruction-following tasks.
Grok 3's reasoning capabilities, refined through large scale reinforcement learning, allow it to think for seconds to minutes, correcting errors, exploring alternatives, and delivering accurate answers. Grok 3 has leading performance across both academic benchmarks and real-world user preferences, achieving an Elo score of 1402 in the Chatbot Arena. Alongside it, we’re unveiling Grok 3 mini, which represents a new frontier in cost-efficient reasoning. Both models are still in training and will evolve rapidly with your feedback. We are rolling out Grok 3 to users in the coming days, along with an early preview of its reasoning capabilities.

 Thinking Harder: Test-time Compute and Reasoning 

 Today, we are announcing two beta reasoning models, Grok 3 (Think) and Grok 3 mini (Think). They were trained using reinforcement learning (RL) at an unprecedented scale to refine its chain-of-thought process, enabling advanced reasoning in a data-efficient manner. With RL, Grok 3 (Think) learned to refine its problem-solving strategies, correct errors through backtracking, simplify steps, and utilize the knowledge it picked up during pretraining. Just like a human when tackling a complex problem, Grok 3 (Think) can spend anywhere from a few seconds to several minutes reasoning, often considering multiple approaches, verifying its own solution, and evaluating how to precisely meet the requirements of the problem.

 
 Both models are still in training, but already they show remarkable performance across a range of benchmarks. We tested these models on the 2025 American Invitational Mathematics Examination (AIME), which was released just 7 days ago on Feb 12th. With our highest level of test-time compute (cons@64), Grok 3 (Think) achieved 93.3% on this competition. Grok 3 (Think) also attained 84.6% on graduate-level expert reasoning (GPQA), and 79.4% on LiveCodeBench for code generation and problem-solving. Furthermore, Grok 3 mini reaches a new frontier in cost-efficient reasoning for STEM tasks that don't require as much world knowledge, reaching 95.8% on AIME 2024 and 80.4% on LiveCodeBench.

 AIME’25

 Competition Math

 AIME’24

 Competition Math

 GPQA

 Graduate-Level Google-Proof Q&A (Diamond)

 LCB

 Code Generation: 10/1/2024 - 2/1/2025

 MMMU

 Multimodal Understanding

 
 To use Grok 3’s reasoning 

... (truncated, 16 KB total)
Resource ID: kb-9e8c6430d83420da | Stable ID: sid_rNB7C5OHoT