Datasets
Popular Datasets
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.
by @AIOZNetwork

TextVQA is a dataset to benchmark visual reasoning based on text in images. TextVQA requires models to read and reason about text in images to answer questions about them.
by @AIOZNetwork

This dataset is a great resource for researchers who want to evaluate cross-lingual question answering performance.
by @AIOZNetwork

Building machines with commonsense to compose realistically plausible sentences is challenging. CommonGen is a constrained text generation task, associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. Given a set of common concepts; the task is to generate a coherent sentence describing an everyday sce- nario using these concepts.
by @AIOZNetwork

MathVista: Diverse benchmark for mathematical reasoning in visual contexts. Includes 6,141 examples from 31 datasets.
by @AIOZNetwork

MNIST is used to train and evaluate image classification models in complex tasks.
by @AIOZNetwork

TAL-SCQ5K are high-quality mathematical competition datasets created by TAL Education Group.
by @AIOZNetwork

MMLU is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings.
by @AIOZNetwork
