
DOCCI
The DOCCI dataset consists of comprehensive descriptions on 15k images specifically taken with the objective of evaluating T2I and I2T models. These cover a lot of key details in the images, as illustrated below.
DOCCI
Summary
Introduction
DOCCI (Descriptions of Connected and Contrasting Images) is a collection of images paired with detailed descriptions. The descriptions explain the key elements of the images, as well as secondary information such as background, lighting, and settings. The images are specifically taken to help assess the precise visual properties of images. DOCCI also includes many related images that vary in having key differences from the others. All descriptions are manually annotated to ensure they adequately distinguish each image from its counterparts.
Dataset Structure
Data Instances
{
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1536x2048>,
'example_id': 'qual_dev_00000',
'description': 'An indoor angled down medium close-up front view of a real sized stuffed dog with white and black colored fur wearing a blue hard hat with a light on it. A couple inches to the right of the dog is a real sized black and white penguin that is also wearing a blue hard hat with a light on it. The dog is sitting, and is facing slightly towards the right while looking to its right with its mouth slightly open, showing its pink tongue. The dog and penguin are placed on a gray and white carpet, and placed against a white drawer that has a large gray cushion on top of it. Behind the gray cushion is a transparent window showing green trees on the outside.'
}
Data Fields
Name | Explanation |
---|---|
image | PIL.JpegImagePlugin.JpegImageFile |
example_id | The unique ID of an example follows this format :<SPLIT_NAME>_<EXAMPLE_NUMBER>. |
description | Text description of the associated image. |
Data Splits
Dataset | Train | Test | Qual Dev | Qual Test |
---|---|---|---|---|
DOCCI | 9,647 | 5,000 | 100 | 100 |
DOCCI-AAR | 4,932 | 5,000 | -- | -- |
Reference
We would like to acknowledge Yasumasa Onoe and Sunayana Rane et al. for creating and maintaining the DOCCI dataset as a valuable resource for the computer vision and machine learning research community. For more information about the DOCCI dataset and its creator, please visit the DOCCI website.
License
The dataset has been released under the Creative Commons 4.0 International License.
Citation
@inproceedings{OnoeDocci2024,
author = {Yasumasa Onoe and Sunayana Rane and Zachary Berger and Yonatan Bitton and Jaemin Cho and Roopal Garg and
Alexander Ku and Zarana Parekh and Jordi Pont-Tuset and Garrett Tanzer and Su Wang and Jason Baldridge},
title = {{DOCCI: Descriptions of Connected and Contrasting Images}},
booktitle = {arXiv},
year = {2024}
}