Data & AI Solutions
Off-the-Shelf Datasets
English (U.S.) conversations in-studio speech dataset

audio

English (U.S.) conversations in-studio speech dataset

Features natural, unscripted conversations between adult native U.S. English speakers, each around 5 minutes long. Centered around various topics and role-playing scenarios, these dialogues capture spontaneous speech with all its complexities like pauses, overlaps, informal grammar and intonation.

Specifications

Modalities: Audio
Language: English (U.S.) [en-U.S.]
Total prompts: 28
Total audio length: 2:22h
Average recording length (in sec): 304.29
Participants: 16
Group: Adults
Task category: Unscripted conversations
Data type: In-studio speech

Accelerate model development & training processes

Authentic, unscripted conversations
Unscripted, free-form conversations between two speakers on given topics and role-play scenarios (e.g., between airline customer support and a customer about flight delay and compensation) that captures natural linguistic features such as false starts, filler words and spontaneous turn-taking.
Rich speaker and scenario diversity
Covering a broad range of conversation types and topics, the dataset captures rich variation in language use, tone, pacing and speaker interaction, helping models learn to generalize across different conversational contexts, speaker personalities and interaction styles.
High-fidelity audio with acoustic control
Recorded in rooms at least 8' x 10' x 7' in size, with reverberation time (RT60) under 0.4 seconds at all frequencies, for minimal echo and background noise. Audio is delivered in 48 kHz, 24-bit mono .wav format, with 1-second silence padding and utterances normalized to -26 dB RMS active speech.

Still searching for the right dataset? We can help.

Reach out and we’ll guide you to the right solution.

Recommended datasets

See all

Case Studies

Explore our success stories

Evaluating a conversational AI model with a highly complex multimodal STEM dataset
Discover how our off-the-shelf science, technology, engineering and mathematics (STEM) dataset contributed to enhancing scientific reasoning and visual processing capabilities in a chatbot model crafted by a leading-edge tech and AI company.
- 4485Physics prompt-response pairs
- 9606Math prompt-response pairs
Download case study
Improving large language model logic and reasoning with a specialized fine-tuning dataset
Explore how TELUS Digital created an off-the-shelf dataset to advance the capabilities of large language models (LLMs).
- 50KSTEM-based prompt-response pairs created
- 300Highly-skilled contributors
Download case study

Evaluating a conversational AI model with a highly complex multimodal STEM dataset
Discover how our off-the-shelf science, technology, engineering and mathematics (STEM) dataset contributed to enhancing scientific reasoning and visual processing capabilities in a chatbot model crafted by a leading-edge tech and AI company.
4485Physics prompt-response pairs
9606Math prompt-response pairs
Download case study
Improving large language model logic and reasoning with a specialized fine-tuning dataset
Explore how TELUS Digital created an off-the-shelf dataset to advance the capabilities of large language models (LLMs).
50KSTEM-based prompt-response pairs created
300Highly-skilled contributors
Download case study

Insights

See all

Access the English (U.S.) conversations in-studio speech dataset

Connect with our experts for pricing and samples.

Solutions

Data & AI Solutions

Consulting

Customer Experience

Digital Services

Trust, Safety & Security

Industries

How telecom brands can seize industry opportunities with AI

Elevating the customer experience for a leading cryptocurrency platform

About Us

Insights

Categories

Industries

Resource Types

English (U.S.) conversations in-studio speech dataset

Specifications

Accelerate model development & training processes

Still searching for the right dataset? We can help.

Recommended datasets

Arabic (Saudi) in-studio speech dataset

English (India) remote speech dataset

English (UK) remote speech dataset

Explore our success stories

Evaluating a conversational AI model with a highly complex multimodal STEM dataset

Improving large language model logic and reasoning with a specialized fine-tuning dataset

Evaluating a conversational AI model with a highly complex multimodal STEM dataset

Improving large language model logic and reasoning with a specialized fine-tuning dataset

Insights

Improving large language model logic and reasoning with a specialized fine-tuning dataset

The evolution of post-training in the age of reasoning models

Custom data for generative AI model fine-tuning

Access the English (U.S.) conversations in-studio speech dataset

Explore our custom AI solutions