
MNIST
MNIST is used to train and evaluate image classification models in complex tasks.
MNIST
Summary
Introduction
The MNIST dataset is comprised of 70,000 black-and-white images, each measuring 28x28 pixels. These images depict handwritten digits and were extracted from two distinct databases maintained by the National Institute of Standards and Technology (NIST).
The dataset is divided into a training dataset, which contains 60,000 images, and a validation dataset, which contains 10,000 images. Each digit from 0 to 9 represents a separate class, resulting in a total of 10 classes. Within each class, there are 7,000 images, with 6,000 images allocated for training and 1,000 images for validation.
The images in the MNIST dataset were contributed by two groups: Census Bureau employees and high school students. The split between these two groups is evenly distributed between the training and testing sets. In other words, an equal number of images from both groups are present in both the training and validation datasets.
Dataset Structure
Data Instances
A sample from the training set is provided below:
{
'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x276021F6DD8>,
'label': 5
}
Data Fields
- img: The image data is represented as a PIL.Image.Image object with a resolution of 28x28 pixels. It's important to note that when accessing the image column using the syntax dataset[0]["image"], the image file is automatically decoded. However, decoding a large number of image files can be time-consuming. Therefore, it is recommended to first query the sample index before accessing the "image" column. In other words, using dataset[0]["image"] is preferred over dataset["image"][0] to optimize the decoding process.
- label: The digit in the dataset is represented as an integer ranging from 0 to 9.
Reference
We would like to acknowledge Yann LeCun, Corinna Cortes và Christopher JC Burges for creating and maintaining the MNIST dataset as a valuable resource for the computer vision and machine learning research community. For more information about the MNIST dataset and its creator, please visit the MNIST dataset website.
License
The dataset has been released under MIT license.
Citation
@article{lecun2010mnist,
title={MNIST handwritten digit database},
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
volume={2},
year={2010}
}