Index
sid_bFjrDfX8rQ / HellaSwag: 85.5
Verdictconfirmed95%
1 check · 4/24/2026Inline sourcing: confirmed
Our claim
entire record- Benchmark
- nD2CFoyeBf
- Model
- GPT-3.5 Turbo
- Score
- 85.5
- Unit
- percent
- Date
- March 15, 2023
- Notes
- 10-shot evaluation from GPT-3.5 technical report
Source evidence
1 src · 1 checkconfirmed95%inline-submission · 4/24/2026
Case № DLmcavXBJ3Filed 4/24/2026Confidence 95%