| Sample | Similarity | CER |
|---|---|---|
| mohajj_greeting | 0.870 Excellent | 1.4% Excellent |
| arkan_alhajj | 0.914 Excellent | 21.4% Fair |
| best_model_710 | 0.880 Excellent | 4.9% Excellent |
| best_model_1455 | 0.639 Fair | 7.3% Good |
Speaker Similarity: cosine of d-vector embeddings via resemblyzer (10 reference clips). CER: character error rate via Azure GPT-4o-transcribe. STOI omitted per-sample — requires matching spoken content (reference ≠ generated text).
⚠ Note: CER may understate errors — the transcription model can infer correct text even when phonemes are missing (e.g. the letter س (sin) was observed to be dropped in some samples).