Skip to content
Longterm Wiki
Back

Deepfake Detection Challenge Dataset

web
deepfakedetectionchallenge.ai·deepfakedetectionchallenge.ai

A key benchmark dataset for AI-generated media detection research; relevant to AI safety discussions around synthetic media misuse, detection capabilities, and evaluation of countermeasures against harmful deepfake technology.

Metadata

Importance: 55/100dataset

Summary

The Deepfake Detection Challenge (DFDC) Dataset, released by Meta/Facebook AI in 2020, is a large-scale benchmark dataset of over 124,000 videos designed to accelerate research in detecting AI-generated manipulated media. Created in partnership with industry and academic leaders, it features videos with multiple facial modification algorithms applied to paid actors. The dataset was used in a Kaggle competition and is publicly available to support ongoing deepfake detection research.

Key Points

  • Full dataset contains 124,000 videos featuring eight different facial modification algorithms, with a smaller 5k-video preview dataset also available.
  • Created by Facebook/Meta AI in collaboration with industry and academic partners as part of the Deepfake Detection Challenge launched in September 2019.
  • Used in a Kaggle competition to benchmark and develop new deepfake detection models from researchers worldwide.
  • Dataset was ethically created using paid actors who consented to the use and manipulation of their likenesses.
  • Requires AWS account setup with IAM credentials to access; associated research papers available on arXiv (2006.07397 and 1910.08854).

Cited by 1 page

PageTypeQuality
AI-Enabled Historical RevisionismRisk43.0

Cached Content Preview

HTTP 200Fetched Apr 7, 20265 KB
Deepfake Detection Challenge Dataset 
 
 
 
 
 
 
 
 
 

 Meta AI 
 AI Research 
 The Latest 
 About 
 Get Llama 
 Try Meta AI 
 
 JUNE 25, 2020

 Deepfake Detection Challenge Dataset

 The Deepfake Detection Challenge Dataset is designed to measure progress on deepfake detection technology.

 Download the Dataset Download the Paper Read the Article Overview

 We partnered with other industry leaders and academic experts in September 2019 to create the Deepfake Detection Challenge (DFDC) in order to accelerate development of new ways to detect deepfake videos. In doing so, we created and shared a unique new dataset for the challenge consisting of more than 100,000 videos. The DFDC has enabled experts from around the world to come together, benchmark their deepfake detection models, try new approaches, and learn from each others’ work.

 The DFDC dataset consists of two versions:

 Preview dataset 5k videos

 Featuring two facial modification algorithms

 Associated research paper 

 

 Full dataset 124k videos

 Featuring eight facial modification algorithms

 Associated research paper 

 

 This full dataset was used by participants during a Kaggle competition to create new and better models to detect manipulated media. The dataset was created by Facebook with paid actors who entered into an agreement to the use and manipulation of their likenesses in our creation of the dataset.

 We hope that by making this dataset available outside the challenge, the research community will continue to accelerate progress on detecting harmful manipulated media.

 Facebook AI’s work in this space can be found in this blog post for more information.

 If using this dataset, please cite the paper associated with the relevant dataset (preview/full):

 @misc{DFDC2019Preview,

 title={The Deepfake Detection Challenge (DFDC) Preview Dataset},

 author={Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, Cristian Canton Ferrer},

 year={2019},

 eprint={1910.08854},

 archivePrefix={arXiv},

 primaryClass={cs.CV}}

 }

 @misc{DFDC2020,

 title={The DeepFake Detection Challenge Dataset},

 author={Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, Cristian Canton Ferrer},

 year={2020},

 eprint={2006.07397},

 archivePrefix={arXiv},

 primaryClass={cs.CV}

 }

 Download Prerequisites

 In order to access the datasets, each user must have an AWS account with an IAM user and Access Keys setup. Each user must also have an AWS account number ready in order to sign up and access the datasets.

 1

 Create an AWS account 

 2

 Create an IAM user 

 3

 Make note of the AWS account ID

 Download the Dataset Results and Impact of the DFDC

 

 The top-performing model on the public dataset achieved 82.56 percent average precision, a common accuracy measure for computer vision tasks. But whe

... (truncated, 5 KB total)
Resource ID: 4d7d6773b35b5278 | Stable ID: sid_3kSaKGWi6C