Audio Samples for AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response

First is the imposter voice. Second is the intended ground truth. Third is a deepfake, an imposter creates using the target's recording, other than the ground truth. Last column is a short caption.

Audio Samples of Regular Speech

No Challenge

Deepfake Sounds Genuine

Imposter Attempting Challenge

Ground Truth (Intended)

Deepfake (Imp => Tar)

Audio Samples of Top-11 Valid Machine-Detectable Challenges (not showing target voice sample.)

Captions are prosepective explanations and not machine predictions.

Static Mouth

Cup mouth

Whisper

Speak softly

High Pitch

Foreign Words

Sing

Emotions

Crosstalk

Instr. Playback

Lyric Playback

Audible distortions at "Harmonies"

Non-compliance and Distortions

Non-compliance

Sounds Genuine

Non-Compliance

Vibrating Voice Distortions (Swahili: 'Simba anapenda kupumzika chini ya mti mkubwa.')

Non-compliance towards the start

Sounds flatter in comparison to imposter

Non-Compliance and Distortions

Non-Compliance and Distortions

Non-Compliance and Distortions

Imposter Attempting Challenge

Ground Truth (Intended)

Deepfake (Imp => Tar)

Audio Samples of the 9 Weaker Tasks

Speak Loudly

Read quickly

Read Slowly

Hold nose

Low Pitch

Accent

Question

Cough/Whistle

Clap

Non-Compliance (Deepfake softer compared to Imposter)

Deepfake Sounds Genuine

Mild Distortions

Non-compliance

Low-Compliance

Vocal Distortions (Imposter: Southern-American Accent; no ground-truth)

Deepfake Sounds Genuine

Invalid Challenge as hard to randomize

Non-compliance

Imposter Attempting Challenge

Ground Truth (Intended)

Deepfake (Imp => Tar)

Original Video Samples (part of the open-source dataset.

High-Pitch

Cross-talk (with a self-played audio on phone)

Whisper