Generated Voice Samples

MoHajj Greeting
6.6s 24 kHz best_model_710 ref: zahra_0001.wav
Pillars of Hajj — أركان الحج
19.8s 24 kHz best_model_710 ref: zahra_0001.wav ⚠ text truncated (166 char limit)

Quality Metrics

0.73Good
Speaker Similarity
Cosine similarity (d-vector)
8.7%Good
CER
Character Error Rate
0.16
STOI
Not valid (different content)
Per-Sample Breakdown
Sample Similarity CER
mohajj_greeting 0.870 Excellent 1.4% Excellent
arkan_alhajj 0.914 Excellent 21.4% Fair
best_model_710 0.880 Excellent 4.9% Excellent
best_model_1455 0.639 Fair 7.3% Good

Speaker Similarity: cosine of d-vector embeddings via resemblyzer (10 reference clips). CER: character error rate via Azure GPT-4o-transcribe. STOI omitted per-sample — requires matching spoken content (reference ≠ generated text).

⚠ Note: CER may understate errors — the transcription model can infer correct text even when phonemes are missing (e.g. the letter ‎س (sin) was observed to be dropped in some samples).

Training Details

Model & Hyperparameters

Model XTTS v2 (GPTTrainer) Framework coqui-tts 0.27.5 GPU NVIDIA A10G 24 GB Learning Rate 5e-6 Batch Size 4 Max Epochs 10 Early Stop Patience 3 (triggered at epoch 7) Epochs Run 8 (0–7)

Best Model

Checkpoint best_model_710.pth (epoch 4, step 710) Eval Loss 3.541 Train Loss 3.166 Size 5.2 GB

Eval Loss per Epoch

3.695E0
3.597E1
3.559E2
3.547E3
3.541E4★
3.546E5
3.542E6
3.546E7

Run Info

W&B Run y1fxug5b Run Name GPT_XTTS_zahra_FT Started 2026-04-18 15:32 UTC Completed 2026-04-18 15:56 UTC Duration 24 minutes
Data & Diarization

Training Data

Total Clips 596 Train / Eval 567 / 29 Total Duration 41.0 minutes Clip Duration 2.0 – 11.6 seconds Language Arabic (ar) Dataset File zahra_dataset.json

Source Audio

YouTube VNo9nOmaghA Title التعلق النافع | بودكاست فنجان Channel إذاعة ثمانية (@thmanyahPodcasts) Duration 99 min 46 sec Format WAV, 48 kHz, stereo, 1.07 GB Blob URL voice-cloning/VNo9nOmaghA.wav

Diarization

Model pyannote/speaker-diarization-3.1 Target Speaker SPEAKER_00 (Zahra) Total Speakers 2
Storage & Artifacts

Azure Blob Storage

Account husazopenaipublic Container voice-cloning Region East US Sample 1 reports/samples/zahra/sample_mohajj_greeting.wav Sample 2 reports/samples/zahra/sample_arkan_alhajj.wav

Cosmos DB

Account husazcomoserverless Database HusAWSCoquiVoiceCloning Container RawAudio (partition: /speaker) Doc ID VNo9nOmaghA