
NIH Chest X-Ray
NIH Chest X-Ray is a large dataset containing chest X-ray images of patients collected by the National Institutes of Health (NIH) of the United States.
NIH Chest X-ray dataset
Summary
Introduction
ChestX-ray dataset comprises 112,120 frontal-view X-ray images of 30,805 unique patients with the text-mined fourteen disease image labels (where each image can have multi-labels), mined from the associated radiological reports using natural language processing. Fourteen common thoracic pathologies include Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema, Emphysema, Fibrosis, Effusion, Pneumonia, Pleural_thickening, Cardiomegaly, Nodule, Mass and Hernia, which is an extension of the 8 common disease patterns listed in the author CVPR2017 paper. Note that original radiology reports (associated with these chest x-ray studies) are not meant to be publicly shared for many reasons. The text-mined disease labels are expected to have accuracy >90%. Please find more details and benchmark performance of trained models based on 14 disease labels in the author arxiv paper: 1705.02315
Dataset Structure
Data Instances
A sample from the training set is provided below:
{
'image_file_path': './datasets/downloads/extracted/95db46f21d556880cf0ecb11d45d5ba0b58fcb113c9a0fff2234eba8f74fe22a/images/00000798_022.png',
'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=1024x1024 at 0x7F2151B144D0>,
'labels': [9, 3]}
Data Fields
The dataset has the following fields:
- image_file_path a str with the image path
- image: A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0].
- labels: an int classification label:
{ "No Finding": 0, "Atelectasis": 1, "Cardiomegaly": 2, "Effusion": 3, "Infiltration": 4, "Mass": 5, "Nodule": 6, "Pneumonia": 7, "Pneumothorax": 8, "Consolidation": 9, "Edema": 10, "Emphysema": 11, "Fibrosis": 12, "Pleural_Thickening": 13, "Hernia": 14 }
Reference
We would like to acknowledge the NIH Clinical Center for creating and maintaining the NIH Chest X-ray dataset as a valuable resource for the computer vision and machine learning research community. For more information about the NIH Chest X-ray dataset and its creator, please visit the NIH Chest X-ray dataset website.
License
The dataset has been released under MIT license.
Citation
@inproceedings{Wang_2017,
doi = {10.1109/cvpr.2017.369},
url = {https://doi.org/10.1109%2Fcvpr.2017.369},
year = 2017,
month = {jul},
publisher = {IEEE},
author = {Xiaosong Wang and Yifan Peng and Le Lu and Zhiyong Lu and Mohammadhadi Bagheri and Ronald M. Summers},
title = {ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases},
booktitle = {2017 {IEEE} Conference on Computer Vision and Pattern Recognition ({CVPR})}
}