1. Data & AI Solutions
  2. Off-the-Shelf Datasets
  3. German (Germany) remote speech dataset
  • audio

German (Germany) remote speech dataset

Featuring over 4,500 scripted prompts recorded by 100 native German speakers from diverse regions across Germany, this high-resolution mono audio dataset (48kHz) is ideal for training automatic speech recognition (ASR) models and voice AI applications tailored for multilingual and regional markets.

Specifications

Modalities
Audio
Language
German (Germany) [de-DE]
Licensable
Yes
Total prompts
4,550
Total audio length
7:46h
Average recording length (in sec)
6.15
Participants
100
Group
Adults
Task category
Scripted prompts
Data type
Remote speech

Accelerate model development & training processes

  • Diverse regional accent coverage

    Leverage LLM-style commands and natural language prompts designed to simulate real-world voice AI interactions and recorded by native speakers that capture authentic German speech.

  • Formatted for multilingual model development

    Part of a broader multilingual dataset collection, with standardized scripts, audio specifications and quality controls, simplifying integration and benchmarking.

  • High-quality audio optimized for easy integration

    Delivered in 48kHz mono with standardized silence padding and remote recording consistency, ensuring clean, ready-to-use data.

Still searching for the right dataset? We can help.

Reach out and we’ll guide you to the right solution.

Case Studies

Explore our success stories

  • Evaluating a conversational AI model with a highly complex multimodal STEM dataset

    Man using his mobile device with a chatbot illustration above the device.

    Discover how our off-the-shelf science, technology, engineering and mathematics (STEM) dataset contributed to enhancing scientific reasoning and visual processing capabilities in a chatbot model crafted by a leading-edge tech and AI company.


    • 4485Physics prompt-response pairs


    • 9606Math prompt-response pairs

    Download case study
  • Improving large language model logic and reasoning with a specialized fine-tuning dataset

    Person working at a laptop holding a mobile phone with an overlaid illustration of LLM features.

    Explore how TELUS Digital created an off-the-shelf dataset to advance the capabilities of large language models (LLMs).


    • 50KSTEM-based prompt-response pairs created


    • 300Highly-skilled contributors

    Download case study

Access the German (Germany) remote speech dataset

Connect with our experts for pricing and samples.