• text
  • images

Visual question answering dataset

Updated May 7, 2025

This dataset of 11,000 multi-domain visual question-answer pairs is designed to improve the multimodal capabilities of AI models. It includes image-based questions, detailed answers and expert explanations across science, business, health and medicine.

Specifications

Modalities
Image, text
Language
English
Volume
11,350
Average token per PRP
50,709
Number of tokens
575,749,986
Task category
Visual Question-Answering
Domain
Science, Business, Health & Medicine
Complexity
3 levels ranging from moderate to very hard

Accelerate model development & training processes

Still searching for the right dataset? We can help.

Reach out and we’ll guide you to the right solution.

Explore our success stories

  • Evaluating a conversational AI model with a highly complex multimodal STEM dataset

    Discover how our off-the-shelf science, technology, engineering and mathematics (STEM) dataset contributed to enhancing scientific reasoning and visual processing capabilities in a chatbot model crafted by a leading-edge tech and AI company.

    • 4485Physics prompt-response pairs
    Read the case study
    case study complex multimodal dataset
  • Improving large language model logic and reasoning with a specialized fine-tuning dataset

    Explore how TELUS Digital created an off-the-shelf dataset to advance the capabilities of large language models (LLMs).

    • 50KSTEM-based prompt-response pairs created
    Read the case study
    case study specialized-fine-tuning-dataset
Item 1 of 2

Access the visual question answering dataset

Connect with our experts for pricing and samples.

Request the dataset