Audio Samples for AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response

First is the imposter voice. Second is the intended ground truth. Third is a deepfake, an imposter creates using the target's recording, other than the ground truth. Last column is a short caption.

Audio Samples of Regular Speech

No Challenge
Deepfake Sounds Genuine
Imposter Attempting Challenge
Ground Truth (Intended)
Deepfake (Imp => Tar)

Audio Samples of Top-11 Valid Machine-Detectable Challenges (not showing target voice sample.)

Captions are prosepective explanations and not machine predictions.

Static Mouth
Cup mouth
Whisper
Speak softly
High Pitch
Foreign Words
Sing
Emotions
Crosstalk
Instr. Playback
Lyric Playback
Audible distortions at "Harmonies"
Non-compliance and Distortions
Non-compliance
Sounds Genuine
Non-Compliance
Vibrating Voice Distortions (Swahili: 'Simba anapenda kupumzika chini ya mti mkubwa.')
Non-compliance towards the start
Sounds flatter in comparison to imposter
Non-Compliance and Distortions
Non-Compliance and Distortions
Non-Compliance and Distortions
Imposter Attempting Challenge
Ground Truth (Intended)
Deepfake (Imp => Tar)

Audio Samples of the 9 Weaker Tasks

Speak Loudly
Read quickly
Read Slowly
Hold nose
Low Pitch
Accent
Question
Cough/Whistle
Clap
Non-Compliance (Deepfake softer compared to Imposter)
Deepfake Sounds Genuine
Mild Distortions
Non-compliance
Low-Compliance
Vocal Distortions (Imposter: Southern-American Accent; no ground-truth)
Deepfake Sounds Genuine
Invalid Challenge as hard to randomize
Non-compliance
Imposter Attempting Challenge
Ground Truth (Intended)
Deepfake (Imp => Tar)

Original Video Samples (part of the open-source dataset.

High-Pitch
Cross-talk (with a self-played audio on phone)
Whisper