text
images

Visual question answering dataset

Updated May 7, 2025

This dataset of 11,500 multi-domain visual question-answer pairs is designed to improve the multimodal capabilities of AI models. It includes image-based questions, detailed answers and expert explanations across science, business, health and medicine.

Specifications

Modalities: Image, text
Language: English
Volume: 11,500
Average token per PRP: 48,165
Number of tokens: 554,619,975
Task category: Visual Question-Answering
Domain: Science, Business, Health & Medicine
Complexity: 3 levels ranging from moderate to very hard

Accelerate model development & training processes

Still searching for the right dataset? We can help.

Reach out and we’ll guide you to the right solution.

Recommended datasets

See all

Explore our success stories

Evaluating a conversational AI model with a highly complex multimodal STEM dataset
Discover how our off-the-shelf science, technology, engineering and mathematics (STEM) dataset contributed to enhancing scientific reasoning and visual processing capabilities in a chatbot model crafted by a leading-edge tech and AI company.
- 4,485Physics prompt-response pairs
Read the case study
Improving large language model logic and reasoning with a specialized fine-tuning dataset
Explore how TELUS Digital created an off-the-shelf dataset to advance the capabilities of large language models (LLMs).
- 50KSTEM-based prompt-response pairs created
Read the case study