AI Voice Cloning

Zahra El Shahry

زهرة الشهري

▶ التعلق النافع | بودكاست فنجان — إذاعة ثمانية

Training Complete

Generated Voice Samples

MoHajj Greeting

6.6s 24 kHz best_model_710 ref: zahra_0001.wav

Pillars of Hajj — أركان الحج

19.8s 24 kHz best_model_710 ref: zahra_0001.wav ⚠ text truncated (166 char limit)

Quality Metrics

0.73Good

Speaker Similarity

Cosine similarity (d-vector)

8.7%Good

CER

Character Error Rate

0.16

STOI

Not valid (different content)

Per-Sample Breakdown

Sample	Similarity	CER
mohajj_greeting	0.870 Excellent	1.4% Excellent
arkan_alhajj	0.914 Excellent	21.4% Fair
best_model_710	0.880 Excellent	4.9% Excellent
best_model_1455	0.639 Fair	7.3% Good

Speaker Similarity: cosine of d-vector embeddings via resemblyzer (10 reference clips). CER: character error rate via Azure GPT-4o-transcribe. STOI omitted per-sample — requires matching spoken content (reference ≠ generated text).

⚠ Note: CER may understate errors — the transcription model can infer correct text even when phonemes are missing (e.g. the letter ‎س (sin) was observed to be dropped in some samples).

Training Details

Model & Hyperparameters

Model XTTS v2 (GPTTrainer) Framework coqui-tts 0.27.5 GPU NVIDIA A10G 24 GB Learning Rate 5e-6 Batch Size 4 Max Epochs 10 Early Stop Patience 3 (triggered at epoch 7) Epochs Run 8 (0–7)

Best Model

Checkpoint best_model_710.pth (epoch 4, step 710) Eval Loss 3.541 Train Loss 3.166 Size 5.2 GB

Eval Loss per Epoch

3.695E0

3.597E1

3.559E2

3.547E3

3.541E4★

3.546E5

3.542E6

3.546E7

Run Info

W&B Run y1fxug5b Run Name GPT_XTTS_zahra_FT Started 2026-04-18 15:32 UTC Completed 2026-04-18 15:56 UTC Duration 24 minutes

Data & Diarization

Training Data

Total Clips 596 Train / Eval 567 / 29 Total Duration 41.0 minutes Clip Duration 2.0 – 11.6 seconds Language Arabic (ar) Dataset File zahra_dataset.json

Source Audio

YouTube VNo9nOmaghA Title التعلق النافع | بودكاست فنجان Channel إذاعة ثمانية (@thmanyahPodcasts) Duration 99 min 46 sec Format WAV, 48 kHz, stereo, 1.07 GB Blob URL voice-cloning/VNo9nOmaghA.wav

Diarization

Model pyannote/speaker-diarization-3.1 Target Speaker SPEAKER_00 (Zahra) Total Speakers 2

Storage & Artifacts

Azure Blob Storage

Account husazopenaipublic Container voice-cloning Region East US Sample 1 reports/samples/zahra/sample_mohajj_greeting.wav Sample 2 reports/samples/zahra/sample_arkan_alhajj.wav

Cosmos DB

Account husazcomoserverless Database HusAWSCoquiVoiceCloning Container RawAudio (partition: /speaker) Doc ID VNo9nOmaghA