How is the dataset organized?

CommonGen

156

•

240

MIT

10K – 100K

Text2Text Generation

English

Opened about a year ago

by @messe-7257

@messe-7257a year ago

Could you break down the main components of CommonGen for me, and how the data is divided?

@trassi-505510 months ago

Hi @messe-7257, here is some information about the main components of this dataset (i read this in Details of its):
The dataset contains about 30,000 unique concept sets and 50,000 sentences. It’s divided into three splits: 67,389 training instances, 4,018 for validation, and 1,497 for testing. Each example includes an integer concept_set_idx, a list of concepts, and the reference target sentence.