All Source Checks
Automated source checking of wiki data against original sources. Each record is checked against one or more external sources to confirm accuracy.
View internal dashboard with coverage & action queue →Verified Correct
50
96% of checked
Has Issues
0
0% of checked
Can't Verify
2
4% of checked
Not Yet Checked
0
of 52 total
Contradicted
0
None found
Outdated
0
All current
Accuracy Rate
100%
confirmed / (confirmed + wrong + outdated)
Needs Recheck
0
All up to date
sid_v1e1ZwDwoA / GSM8K: 40.3
sid_v1e1ZwDwoA / HumanEval: 30.5
sid_v1e1ZwDwoA / HellaSwag: 84
sid_v1e1ZwDwoA / MMLU: 60.1
sid_kWPQCvjKSg / MATH: 73.8
sid_kWPQCvjKSg / HumanEval: 89
sid_kWPQCvjKSg / MMLU: 87.3
sid_nnv09Wl5OQ / Chatbot Arena Elo: 1402
sid_nnv09Wl5OQ / LiveCodeBench: 79.4
sid_nnv09Wl5OQ / GSM8K: 89.3
sid_nnv09Wl5OQ / MMLU-Pro: 79.9
sid_nnv09Wl5OQ / HumanEval: 86.5
sid_oSG59ppF7g / Aider Polyglot: 9.8
sid_oSG59ppF7g / MMLU: 80.1
sid_nywmt9QdsA / MMLU: 80.1
sid_bFjrDfX8rQ / GSM8K: 57.1
sid_bFjrDfX8rQ / DROP: 61.4
sid_bFjrDfX8rQ / WinoGrande: 81.6
sid_bFjrDfX8rQ / TruthfulQA: 47
sid_bFjrDfX8rQ / HellaSwag: 85.5
sid_Gqv7h9oEwA / HellaSwag: 95
sid_Gqv7h9oEwA / GSM8K: 92
sid_Gqv7h9oEwA / MATH: 76.6
sid_Gqv7h9oEwA / MGSM: 90.5
sid_Gqv7h9oEwA / HumanEval: 90.2
sid_Gqv7h9oEwA / MMLU: 88.7
sid_PaKhQQNPkg / MATH: 78.3
sid_PaKhQQNPkg / MMLU: 92.4
sid_PaKhQQNPkg / HumanEval: 89.7
sid_PaKhQQNPkg / MMLU-Pro: 90.99
sid_PaKhQQNPkg / SWE-bench Verified: 80.6
sid_PaKhQQNPkg / Humanity's Last Exam: 44.4
sid_PaKhQQNPkg / ARC-AGI-2: 77.1
sid_PaKhQQNPkg / GPQA Diamond: 94.3
sid_svlbcrT5oQ / BBH: 87.5
sid_svlbcrT5oQ / GPQA Diamond: 59.1
sid_svlbcrT5oQ / DROP: 91.6
sid_svlbcrT5oQ / MMLU-Pro: 75.9
sid_svlbcrT5oQ / HumanEval: 65.2
sid_svlbcrT5oQ / GSM8K: 89.3
sid_svlbcrT5oQ / MATH: 61.6
sid_svlbcrT5oQ / MMLU: 88.5
sid_dHgSM46fMw / SWE-bench Verified: 74.5
sid_dHgSM46fMw / GPQA Diamond: 80.9
sid_y87VxEBBIA / SWE-bench Verified: 73.3
Data from source_check_verdicts table. Click a row to view detailed evidence.