audio

English (UK) remote speech dataset

Featuring over 4,500 scripted prompts recorded by 100 native speakers from diverse regions across the UK, this high-resolution mono audio dataset (48kHz) is ideal for training automatic speech recognition (ASR) models and voice AI applications tailored for multilingual and regional markets.

Specifications

Modalities: Audio
Language: English (UK) [en-GB]
Licensable: Yes
Total prompts: 4,550
Total audio length: 7:13h
Average recording length (in sec): 5.71
Participants: 100
Group: Adults
Task category: Scripted prompts
Data type: Remote speech

Accelerate model development & training processes

Still searching for the right dataset? We can help.

Reach out and we’ll guide you to the right solution.

Recommended datasets

See all

Explore our success stories

Evaluating a conversational AI model with a highly complex multimodal STEM dataset
Discover how our off-the-shelf science, technology, engineering and mathematics (STEM) dataset contributed to enhancing scientific reasoning and visual processing capabilities in a chatbot model crafted by a leading-edge tech and AI company.
- 4,485Physics prompt-response pairs
Read the case study
Improving large language model logic and reasoning with a specialized fine-tuning dataset
Explore how TELUS Digital created an off-the-shelf dataset to advance the capabilities of large language models (LLMs).
- 50KSTEM-based prompt-response pairs created
Read the case study