DF40: Toward Next-Generation Deepfake Detection
Zhiyuan Yan
Taiping Yao
Shen Chen
Yandan Zhao
Xinghe Fu
Junwei Zhu
Donghao Luo
Li Yuan
Chengjie Wang
Shouhong Ding
Yunsheng Wu
Paper [ArXiv]
Code [GitHub]
Is it possible to detect the various types of AI-generated faces (e.g., face-swapping, talking-head, AIGC, etcs)? This work proposes a comprehensive dataset called DF40, which comprises 40 distinct synthesis techniques, including 10 face-swapping (FS), 12 face-reenactment (FR), 10 entire face synthesis (EFS), and 5 face editing (FE) methods. We then conduct more than 2,000+ evaluations on a standard benchmark, leading to several new findings with insightful analysis.

Why using DF40? (highlights)


- Realism and diversity: DF40 contains the latest and most realistic synthesis techniques from various types such as HeyGen (FR), DeepFaceLab (FS), MidJourney-v6 (EFS), Collaborative-Diffusion (FE), etc.
- Multiple scenarios: DF40 contains 31 known/white-box synthesis methods (both the real data and fake methods are known) and 9 unknown/black-box methods (one of the real data and fake methods is unknown).
- Aligned data domain: the proposed 31 known/white-box methods are applied to the real data from the widely used FF++ and Celeb-DF datasets, meaning our generated fakes and their original fakes are from the same data domain.
- Comprehensive benchmarking: Our benchmark conduct more than 2,000+ evaluations with 4 standard evaluation protocols, leading to several new findings with insightful analysis.

Why doing this? (motivation)

In this work, we aim to address the following challenges in the current deepfake detection research (especially the datasets):
(1) Forgery Diversity: Deepfake techniques are commonly referred to as both face forgery (face-swapping and face-reenactment) and entire image synthesis (AIGC, especially face). Most existing datasets only contain partial types of them, with limited forgery methods implemented (e.g., 2 swapping and 2 reenactment methods in FF++);
(2) Forgery Realism: The dominated training dataset, such as FF++, contains out-of-date forgery techniques from the past four years. "Honing skills" on these forgeries makes it difficult to guarantee effective detection generalization toward nowadays' SoTA deepfakes;
(3) Evaluation Protocol: Most detection works perform evaluations on one type, e.g., face-swapping types only, which hinders the development of universal deepfake detectors.
To address this dilemma, we construct a highly diverse and large-scale deepfake detection dataset called DF40, which comprises 40 distinct deepfake techniques with realism, diversity, and comprehensivity. We also open up several valuable yet previously underexplored research questions to inspire future works.


Dataset Details & Download

  • Real Data:
      - For "known" methods: Each "known" method includes fake data from two domains: FaceForensics++ (ff) and Celeb-DF (cdf). To obtain the real data for both training and testing purposes, please use the following links: FaceForensics++ real data and Celeb-DF real data.
      - For "unknown" methods: The real data is already included within the folder, so there is NO additional download link required for the real data of the unknown methods.
  • DF40-test (after preprocessing):
      - Description: We provide the download Link of the whole DF40 testing data after preprocessing (frame extraction and face cropping), including fake images only.
      - Size: The whole size is ~93G. Please refer to the below illustration and our GitHub for the size of each specific method.
      - Folder structure: See the illustration below. Red color is the hyper-link to download data.
      
      DF40_test (download DF40-test-all, ~93G)
      │
      ├── known
      │   ├── FS (download FS-only, ~19.2G)
      │   │   ├── FSGAN
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── FaceSwap
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── SimSwap
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── InSwapper
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── BlendFace
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── UniFace
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── MobileSwap
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── e4s
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── FaceDancer
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   ├── FR (download FR-only, ~19.3G)
      │   │   ├── FOMM
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── FS_vid2vid
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── Wav2Lip
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── MRAA
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── OneShot
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── PIRender
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── TPSM
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── LIA
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── DaGAN
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── SadTalker
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── MCNet
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   │   ├── HeyGen
      │   │   │   ├── ff
      │   │   │   └── cdf
      │   └── EFS (download EFS-only, ~44.5G)
      │       ├── VQGAN
      │       │   ├── ff
      │       │   └── cdf
      │       ├── StyleGAN2
      │       │   ├── ff
      │       │   └── cdf
      │       ├── StyleGAN3
      │       │   ├── ff
      │       │   └── cdf
      │       ├── StyleGAN-XL
      │       │   ├── ff
      │       │   └── cdf
      │       ├── SD-2.1
      │       │   ├── ff
      │       │   └── cdf
      │       ├── DDPM
      │       │   ├── ff
      │       │   └── cdf
      │       ├── RDDM
      │       │   ├── ff
      │       │   └── cdf
      │       ├── PixArt
      │       │   ├── ff
      │       │   └── cdf
      │       ├── DiT-XL/2
      │       │   ├── ff
      │       │   └── cdf
      │       └── SiT-XL/2
      │           ├── ff
      │           └── cdf
      │
      └── unknown (download unknown-all, ~10G)
          ├── FS
          │   └── DeepFaceLab
          │       ├── fake
          │       └── real
          ├── FR
          │   └── HeyGen
          │       ├── fake
          │       └── real
          ├── EFS
          │   ├── MidJourney
          │   │   ├── fake
          │   │   └── real
          │   └── whichfaceisreal
          │       ├── fake
          │       └── real
          └── FE
              ├── starGAN
              │   ├── fake
              │   └── real
              ├── starGAN-v2
              │   ├── fake
              │   └── real
              ├── StyleCLIP
              │   ├── fake
              │   └── real
              ├── e4e
              │   ├── fake
              │   └── real
              └── CollabDiff
                  ├── fake
                  └── real
      
      
      Note: methods in the "known" folder contain fake data under the "FaceForensics++" (ff) and "Celeb-DF" (cdf) data domains. "unknown" means other unknown data domains.
  • DF40-train (after preprocessing):
      - Description: Similar to the DF40-test, we provide the processed fake images for training in this link. Please note that the training set ONLY includes the "known" methods and utilizes the FaceForensics++ (ff) domain for training. The Celeb-DF (cdf) domain is not used for training purposes but for testing only.
      - Size: The whole size is ~50G.
      - Folder structure: See the illustration below. Red color is the hyper-link to download data.
      
      DF40_train (download DF40-train-all, ~68.3G)
      │
      ├── FS (download FS-only, ~14.9G)
      │   ├── FSGAN
      │   ├── FaceSwap
      │   ├── SimSwap
      │   ├── InSwapper
      │   ├── BlendFace
      │   ├── UniFace
      │   ├── MobileSwap
      │   ├── e4s
      │   └── FaceDancer
      ├── FR (download FR-only, ~20.9G)
      │   ├── FOMM
      │   ├── FS_vid2vid
      │   ├── Wav2Lip
      │   ├── MRAA
      │   ├── OneShot
      │   ├── PIRender
      │   ├── TPSM
      │   ├── LIA
      │   ├── DaGAN
      │   ├── SadTalker
      │   ├── MCNet
      │   └── HeyGen
      └── EFS (download EFS-only, ~32.5G)
          ├── VQGAN
          ├── StyleGAN2
          ├── StyleGAN3
          ├── StyleGAN-XL
          ├── SD-2.1
          ├── DDPM
          ├── RDDM
          ├── PixArt
          ├── DiT-XL/2
          └── SiT-XL/2
      
      

  • Main Results


    We conduct main evaluations using four types of protocols, including Cross-forgery evaluation (Protocol-1), Cross-domain evaluation (Protocol-2), Toward unknown forgery and domain evaluation (Protocol-3), and One-Verse-All (OvA) evaluation (Protocol-4).
    (Results-1) The results of Protocol-1:


    (Results-2) The results of Protocol-2:


    (Results-3) The results of Protocol-3:

    (Results-4) The results of Protocol-4:


    Additional Results & Analysis


    We also provide the following additional results to enlarge our evaluation scope.
    (Results-5) Train on the DF40 "known" methods and test on "unknown" methods of DF40:


    (Results-6) Train on DF40 and test on non-face AIGCs (GenImage dataset):


    (Results-7) Train on previous dataset (FaceForensics++) and test on DF40:

    (Results-8) Train on DF40 and test on other deepfake datasets:

    (Analysis-1) Logits and confidence distribution analysis for both fake and real classes:

    (Analysis-2) t-SNE visualizations for different models on distinct testing data:

    (Analysis-3) Frequency-level artifacts analysis for different synthesis methods:


    Takeaways & Findings

    Here, we very briefly outline several key takeaways to encapsulate our contributions and conclusions:
    - 1. The dataset is pivotal in addressing the generalization problem in deepfake detection. To this end, we build a highly diverse deepfake dataset incorporating 40 distinct deepfake techniques, including the most recent ones. The visual example can be seen in the Supplementary of our paper;
    - 2. Data domain and forgery method collectively determine the final detection results (see the causal graph in our main paper for intuitive understanding);
    - 3. Many recent face-swapping methods (e.g., SimSwap) do NOT involve a blending process but generate all content directly (including the background).
    - 4. Blending is not all you need to detect deepfakes, even face-swapping forgeries.
    - 5. CLIP-large is the most powerful baseline model due to its notable ability to learn robust real-face distribution.
    - 6. The model trained on the face domain (our dataset) can also be somehow applied to the non-face domain (pure AIGC) fakes.
    - 7. There is an urgent need to develop an effective incremental learning framework aiming to produce improved results, creating "1+1>2" results when combining many different forgeries together for training.


    Visual Examples

    We show visual examples of FS, FR, EFS, and unknown methods (including FE) from our DF40 dataset, illustrated below.



    Paper and Supplementary Material

    Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Li Yuan, Chengjie Wang, Shouhong Ding, Yunsheng Wu.
    DF40: Toward Next-Generation Deepfake Detection.
    (hosted on ArXiv)


    [Bibtex]


    Acknowledgements

    This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.