- Data & AI Solutions
- Off-the-Shelf Datasets
- Arabic (Saudi) in-studio speech dataset
- audio
Arabic (Saudi) in-studio speech dataset
Featuring over 1,300 prompts, this dataset supports wake word detection and command phrase recognition. Recorded by voice-actors and native-speakers of Arabic (Saudi dialect) in a studio environment, this dataset delivers mono-channel audio at 44.1 kHz, 24-bit fidelity in clear, consistent quality.

Specifications
- Modalities
- Audio
- Language
- Saudi (Arabic) [ar-SA]
- Total prompts
- 1,392
- Total audio length
- 1:04h
- Average recording length (in sec)
- 2.76
- Participants
- 29
- Group
- Adults
- Task category
- Scripted prompts
- Data type
- In-studio speech
Accelerate model development & training processes
High-fidelity, studio-recorded audio
Captured in a controlled professional studio environment to minimize background noise, echo and other acoustic distortions. Standardized 1 second silence padding to help with segmenting utterances and noise analysis.
Scripted prompts with comprehensive coverage
Carefully scripted prompts to ensure consistency in phrasing while covering a wide range of wake words and command expressions, including multiple variations of how commands and wake words may be naturally phrased or used in different contexts.
Authentic Saudi dialect with phonetic details
Recorded exclusively by native Saudi Arabic speakers to capture authentic pronunciation, intonation and regional linguistic nuances, enabling models to detect subtle phonetic variations across varied speakers and acoustic environments.

Explore our success stories
Evaluating a conversational AI model with a highly complex multimodal STEM dataset
4485Physics prompt-response pairs
9606Math prompt-response pairs
Improving large language model logic and reasoning with a specialized fine-tuning dataset
50KSTEM-based prompt-response pairs created
300Highly-skilled contributors
Evaluating a conversational AI model with a highly complex multimodal STEM dataset
4485Physics prompt-response pairs
9606Math prompt-response pairs
Improving large language model logic and reasoning with a specialized fine-tuning dataset
50KSTEM-based prompt-response pairs created
300Highly-skilled contributors
Access the Arabic (Saudi) in-studio speech dataset
Connect with our experts for pricing and samples.
Explore our custom AI solutions
