- text
- images
Visual question answering dataset
Updated May 7, 2025This dataset of 11,000 multi-domain visual question-answer pairs is designed to improve the multimodal capabilities of AI models. It includes image-based questions, detailed answers and expert explanations across science, business, health and medicine.

Specifications
- Modalities
- Image, text
- Language
- English
- Volume
- 11,350
- Average token per PRP
- 50,709
- Number of tokens
- 575,749,986
- Task category
- Visual Question-Answering
- Domain
- Science, Business, Health & Medicine
- Complexity
- 3 levels ranging from moderate to very hard
Accelerate model development & training processes
Still searching for the right dataset? We can help.
Reach out and we’ll guide you to the right solution.


Explore our success stories
Evaluating a conversational AI model with a highly complex multimodal STEM dataset
Discover how our off-the-shelf science, technology, engineering and mathematics (STEM) dataset contributed to enhancing scientific reasoning and visual processing capabilities in a chatbot model crafted by a leading-edge tech and AI company.
- 4485Physics prompt-response pairs

Improving large language model logic and reasoning with a specialized fine-tuning dataset
Explore how TELUS Digital created an off-the-shelf dataset to advance the capabilities of large language models (LLMs).
- 50KSTEM-based prompt-response pairs created

Item 1 of 2
Insights
See allAccess the visual question answering dataset
Connect with our experts for pricing and samples.
Request the dataset


