Datasets

Search
all
verified

Popular Datasets

BIG-bench
BIG-bench

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.

TextVQA
TextVQA

TextVQA is a dataset to benchmark visual reasoning based on text in images. TextVQA requires models to read and reason about text in images to answer questions about them.

XQuAD
XQuAD

This dataset is a great resource for researchers who want to evaluate cross-lingual question answering performance.

CommonGen
CommonGen

Building machines with commonsense to compose realistically plausible sentences is challenging. CommonGen is a constrained text generation task, associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. Given a set of common concepts; the task is to generate a coherent sentence describing an everyday sce- nario using these concepts.

MathVista
MathVista

MathVista: Diverse benchmark for mathematical reasoning in visual contexts. Includes 6,141 examples from 31 datasets.

MNIST
MNIST

MNIST is used to train and evaluate image classification models in complex tasks.

TAL-SCQ5K
TAL-SCQ5K

TAL-SCQ5K are high-quality mathematical competition datasets created by TAL Education Group.

MMLU
MMLU

MMLU is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings.