Brier Score - Wikipedia

web

Wikipedia·en.wikipedia.org/wiki/Brier_score

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Wikipedia

The Brier score is a standard metric for evaluating probabilistic predictions, relevant to AI safety for assessing calibration of AI systems and forecasting models used in risk assessment.

Metadata

Importance: 35/100wiki pagereference

Summary

The Brier score is a strictly proper scoring rule measuring the accuracy of probabilistic predictions via mean squared error between predicted probabilities and actual outcomes. It applies to binary and categorical outcomes and rewards well-calibrated probability estimates. Lower scores indicate better-calibrated predictions.

Key Points

•The Brier score measures mean squared difference between predicted probabilities and actual binary/categorical outcomes, ranging from 0 (perfect) to 1.
•It is a strictly proper scoring rule, meaning it incentivizes honest probability reporting rather than strategic misreporting.
•Applicable to binary and categorical outcomes but inappropriate for ordinal variables with three or more ordered values.
•Proposed by Glenn W. Brier in 1950, it remains a standard metric for evaluating probabilistic forecasts.
•Relevant for evaluating AI model calibration and forecasting systems used in safety-critical decision-making.

Cited by 1 page

Page	Type	Quality
Forecasting Research Institute (FRI)	Organization	55.0

Cached Content Preview

HTTP 200Fetched Apr 21, 202621 KB

From Wikipedia, the free encyclopedia 
 
 
 
 
 
 Measure of the accuracy of probabilistic predictions 
 The Brier score is a strictly proper scoring rule that measures the accuracy of probabilistic predictions . For unidimensional predictions, it is strictly equivalent to the mean squared error as applied to predicted probabilities.

 The Brier score is applicable to tasks in which predictions must assign probabilities to a set of mutually exclusive discrete outcomes or classes. The set of possible outcomes can be either binary or categorical in nature, and the probabilities assigned to this set of outcomes must sum to one (where each individual probability is in the range of 0 to 1). It was proposed by Glenn W. Brier in 1950. &#91; 1 &#93; 

 The Brier score can be thought of as a cost function . More precisely, across all items 
 
 
 
 i 
 &#x2208; 
 
 1... 
 N 
 
 
 
 {\displaystyle i\in {1...N}} 
 
 in a set of N predictions, the Brier score measures the mean squared difference between:

 The predicted probability assigned to the possible outcomes for item i 

 The actual outcome 
 
 
 
 
 o 
 
 i 
 
 
 
 
 {\displaystyle o_{i}} 
 
 
 
 Therefore, the lower the Brier score is for a set of predictions, the better the predictions are calibrated. Note that the Brier score, in its most common formulation, takes on a value between zero and one, since this is the square of the largest possible difference between a predicted probability (which must be between zero and one) and the actual outcome (which can take on values of only 0 or 1). In the original (1950) formulation of the Brier score, the range is double, from zero to two.

 The Brier score is appropriate for binary and categorical outcomes that can be structured as true or false, but it is inappropriate for ordinal variables which can take on three or more values.

 
 Definition

 [ edit ] 
 The most common formulation of the Brier score is 

 
 
 
 
 B 
 S 
 = 
 
 
 1 
 N 
 
 
 
 &#x2211; 
 
 t 
 = 
 1 
 
 
 N 
 
 
 ( 
 
 f 
 
 t 
 
 
 &#x2212; 
 
 o 
 
 t 
 
 
 
 ) 
 
 2 
 
 
 
 
 
 
 {\displaystyle BS={\frac {1}{N}}\sum \limits _{t=1}^{N}(f_{t}-o_{t})^{2}\,\!} 
 
 
 in which 
 
 
 
 
 f 
 
 t 
 
 
 
 
 {\displaystyle f_{t}} 
 
 is the probability that was forecast, 
 
 
 
 
 o 
 
 t 
 
 
 
 
 {\displaystyle o_{t}} 
 
 the actual outcome of the event at instance 
 
 
 
 t 
 
 
 {\displaystyle t} 
 
 ( 
 
 
 
 0 
 
 
 {\displaystyle 0} 
 
 if it does not happen and 
 
 
 
 1 
 
 
 {\displaystyle 1} 
 
 if it does happen) and 
 
 
 
 N 
 
 
 {\displaystyle N} 
 
 is the number of forecasting instances. In effect, it is the mean squared error of the forecast. This formulation is mostly used for binary events (for example "rain" or "no rain"). The above equation is a proper scoring rule only for binary events; if a multi-category forecast is to be evaluated, then the original definition given by Brier below should be used.

 Example

 [ edit ] 
 Suppose that one is forecasting the probability

... (truncated, 21 KB total)

Resource ID: 432c848791b39d2e | Stable ID: sid_7w9sLNFp1g