Skip to content
Longterm Wiki
benchmark-result

Claude 3.5 Sonnet on GSM8K: 96.4

Child of GSM8K

Metadata

Source Tablebenchmark_results
Source IDi5f76T7Z3K
ParentGSM8K
Children
CreatedApr 24, 2026, 7:31 PM
UpdatedApr 24, 2026, 7:31 PM
SyncedApr 24, 2026, 7:31 PM

Record Data

idi5f76T7Z3K
benchmarkIdfjjBrOI3p2
modelIdClaude 3.5 Sonnet(ai-model)
score96.4
unitpercent
date2024-06-21
sourceUrl
notesGrade school math word problems, multi-step reasoning

Source Check Verdicts

confirmed95% confidence

Last checked: 4/24/2026

Inline sourcing: confirmed

Debug info

Thing ID: i5f76T7Z3K

Source Table: benchmark_results

Source ID: i5f76T7Z3K

Parent Thing ID: fjjBrOI3p2