How is the dataset organized?

CommonGen

156
240
MIT
10K – 100K
Text2Text Generation
English

Opened about a year ago

by @messe-7257

1
@messe-7257a year ago

Could you break down the main components of CommonGen for me, and how the data is divided?

@trassi-505510 months ago

Hi @messe-7257, here is some information about the main components of this dataset (i read this in Details of its):
The dataset contains about 30,000 unique concept sets and 50,000 sentences. It’s divided into three splits: 67,389 training instances, 4,018 for validation, and 1,497 for testing. Each example includes an integer concept_set_idx, a list of concepts, and the reference target sentence.

comment reaction2

0/10000

Sign up or Log in to comment