Skip to content
Longterm Wiki
Back

GitHub - openai/prm800k: 800,000 step-level correctness labels on LLM solutions to MATH problems · GitHub

web

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

This dataset is the empirical foundation for OpenAI's work on process supervision vs. outcome supervision; closely related to debates about scalable oversight, recursive reward modeling, and catching reasoning errors in capable AI systems.

Metadata

Importance: 72/100dataset

Summary

PRM800K is a dataset released by OpenAI containing 800,000 step-level human correctness labels on large language model solutions to MATH competition problems. It supports training and evaluating process reward models (PRMs), which provide feedback on individual reasoning steps rather than final answers. This dataset underpins research into process supervision as a method for improving LLM reasoning reliability and safety.

Key Points

  • Contains 800,000 step-level correctness labels on LLM-generated solutions to MATH benchmark problems, enabling granular supervision of reasoning chains.
  • Supports training Process Reward Models (PRMs) that score each step of a solution, rather than only the final outcome.
  • Process supervision is proposed as a safer and more effective alternative to outcome supervision for catching subtle reasoning errors.
  • Released alongside OpenAI's research showing PRMs outperform outcome reward models on competitive math problem-solving benchmarks.
  • Relevant to AI safety as step-level feedback can help detect and reduce deceptive or flawed reasoning in LLMs.

Cited by 2 pages

PageTypeQuality
Process SupervisionApproach65.0
Scalable OversightResearch Area68.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202612 KB
GitHub - openai/prm800k: 800,000 step-level correctness labels on LLM solutions to MATH problems · GitHub 

 
 
 
 

 
 

 

 

 
 

 
 

 

 

 

 

 

 

 

 

 

 

 
 
 

 
 
 

 

 

 
 
 
 

 

 

 

 

 

 

 
 

 

 

 
 

 
 
 

 
 

 

 

 
 
 
 

 
 Skip to content 

 
 
 
 
 
 

 
 
 
 
 

 

 

 

 
 
 
 
 
 You signed in with another tab or window. Reload to refresh your session. 
 You signed out in another tab or window. Reload to refresh your session. 
 You switched accounts on another tab or window. Reload to refresh your session. 

 
 
 
 Dismiss alert 

 
 
 

 

 

 
 
 
 
 
 
 
 
 
 
 
 {{ message }} 

 
 
 
 
 

 

 
 
 
 
 
 

 

 

 

 

 
 
 
 
 
 
 
 
 
 openai
 
 / 
 
 prm800k 
 

 Public 
 

 

 
 
 
 

 
 
 
 Notifications
 You must be signed in to change notification settings 

 

 
 
 
 Fork
 126 
 
 

 
 
 
 
 
 Star
 2.1k 
 
 

 

 
 

 
 

 

 
 

 
 
 

 
 
 

 
 
 
   main Branches Tags Go to file Code Open more actions menu Folders and files

 Name Name Last commit message Last commit date Latest commit

   History

 2 Commits 2 Commits prm800k prm800k     .gitattributes .gitattributes     LICENSE LICENSE     README.md README.md     setup.py setup.py     View all files Repository files navigation

 PRM800K: A Process Supervision Dataset

 
 [Blog Post] [Paper] 

 
 This repository accompanies the paper Let's Verify Step by Step and presents the PRM800K dataset introduced there. PRM800K is a process supervision dataset containing 800,000 step-level correctness labels for model-generated solutions to problems from the MATH dataset. More information on PRM800K and the project can be found in the paper.

 We are releasing the raw labels as well as the instructions we gave labelers during phase 1 and phase 2 of the project. Example labels can be seen in the image below.

 
 

 Data

 
 The data/ folder contains our labels formatted as newline-delimited lists of json data. The data has been uploaded with Git LFS , which you'll need to install in order to properly clone the repository.

 Each line represents 1 full solution sample and can contain many step-level labels. Here is one annotated line:

 { 
 // UUID representing a particular labeler. 
 "labeler" : "340d89bc-f5b7-45e9-b272-909ba68ee363" , 

 // The timestamp this trajectory was submitted. 
 "timestamp" : "2023-01-22T04:34:27.052924" , 

 // In phase 2, we split our data collection into generations, using our best 
 // PRM so far to pick which solutions to score in the next generation. 
 // In phase 1, this value should always be null. 
 "generation" : 9 , 

 // In each generation, we reserve some solutions for quality control. We serve 
 // these solutions to every labeler, and check that they agree with our 
 // gold labels. 
 "is_quality_control_question" : false , 

 // generation -1 was reserved for a set of 30 questions we served every 
 // labeler in order to screen for base task performance. 
 "is_initial_screening_question" : false , 

 // Metadata

... (truncated, 12 KB total)
Resource ID: eccb4758de07641b | Stable ID: sid_xcnS5dcMHy