
X-CSR
To create these datasets, the authors automatically translated the original CSQA and CODAH datasets, originally available only in English, into 15 other languages.
X-CSR
Summary
Introduction
To evaluate multi-lingual language models (ML-LMs) for commonsense reasoning in a cross-lingual zero-shot transfer setting (X-CSR), i.e., training in English and test in other languages, we create two benchmark datasets, namely X-CSQA and X-CODAH. Specifically, we automatically translate the original CSQA and CODAH datasets, which only have English versions, to 15 other languages, forming development and test sets for studying X-CSR. As our goal is to evaluate different ML-LMs in a unified evaluation protocol for X-CSR, we argue that such translated examples, although might contain noise, can serve as a starting benchmark for us to obtain meaningful analysis, before more human-translated datasets will be available in the future.
Dataset Structure
Data Instances
An example of the X-CSQA dataset:
{
"id": "be1920f7ba5454ad", # an id shared by all languages
"lang": "en", # one of the 16 language codes.
"question": {
"stem": "What will happen to your knowledge with more learning?", # question text
"choices": [
{"label": "A", "text": "headaches" },
{"label": "B", "text": "bigger brain" },
{"label": "C", "text": "education" },
{"label": "D", "text": "growth" },
{"label": "E", "text": "knowing more" }
]},
"answerKey": "D" # hidden for test data.
}
An example of the X-CODAH dataset:
{
"id": "b8eeef4a823fcd4b", # an id shared by all languages
"lang": "en", # one of the 16 language codes.
"question_tag": "o", # one of 6 question types
"question": {
"stem": " ", # always a blank as a dummy question
"choices": [
{"label": "A",
"text": "Jennifer loves her school very much, she plans to drop every courses."},
{"label": "B",
"text": "Jennifer loves her school very much, she is never absent even when she's sick."},
{"label": "C",
"text": "Jennifer loves her school very much, she wants to get a part-time job."},
{"label": "D",
"text": "Jennifer loves her school very much, she quits school happily."}
]
},
"answerKey": "B" # hidden for test data.
}
Data Fields
id
: an id shared by all languageslang
: one of the 16 language codes.question_tag
: one of 6 question typesstem
: always a blank as a dummy questionchoices
: a list of answers, each answer has:label
: a string answer identifier for each answertext
: the answer text
Data Splits
X-CSQA
: There are 8,888 examples for training in English, 1,000 for development in each language, and 1,074 examples for testing in each language.X-CODAH
: There are 8,476 examples for training in English, 300 for development in each language, and 1,000 examples for testing in each language.
Reference
We would like to acknowledge Lin, Bill Yuchen and Lee et al. for creating and maintaining the X-CSR dataset as a valuable resource for the computer vision and machine learning research community. For more information about the X-CSR dataset and its creator, please visit the X-CSR website.
License
The dataset has been released under the MIT License.
Citation
# X-CSR
@inproceedings{lin-etal-2021-common,
title = "Common Sense Beyond {E}nglish: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning",
author = "Lin, Bill Yuchen and
Lee, Seyeon and
Qiao, Xiaoyang and
Ren, Xiang",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.102",
doi = "10.18653/v1/2021.acl-long.102",
pages = "1274--1287",
abstract = "Commonsense reasoning research has so far been limited to English. We aim to evaluate and improve popular multilingual language models (ML-LMs) to help advance commonsense reasoning (CSR) beyond English. We collect the Mickey corpus, consisting of 561k sentences in 11 different languages, which can be used for analyzing and improving ML-LMs. We propose Mickey Probe, a language-general probing task for fairly evaluating the common sense of popular ML-LMs across different languages. In addition, we also create two new datasets, X-CSQA and X-CODAH, by translating their English versions to 14 other languages, so that we can evaluate popular ML-LMs for cross-lingual commonsense reasoning. To improve the performance beyond English, we propose a simple yet effective method {---} multilingual contrastive pretraining (MCP). It significantly enhances sentence representations, yielding a large performance gain on both benchmarks (e.g., +2.7{\%} accuracy for X-CSQA over XLM-R{\_}L).",
}
# CSQA
@inproceedings{Talmor2019commonsenseqaaq,
address = {Minneapolis, Minnesota},
author = {Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
doi = {10.18653/v1/N19-1421},
pages = {4149--4158},
publisher = {Association for Computational Linguistics},
title = {CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge},
url = {https://www.aclweb.org/anthology/N19-1421},
year = {2019}
}
# CODAH
@inproceedings{Chen2019CODAHAA,
address = {Minneapolis, USA},
author = {Chen, Michael and D{'}Arcy, Mike and Liu, Alisa and Fernandez, Jared and Downey, Doug},
booktitle = {Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for {NLP}},
doi = {10.18653/v1/W19-2008},
pages = {63--69},
publisher = {Association for Computational Linguistics},
title = {CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense},
url = {https://www.aclweb.org/anthology/W19-2008},
year = {2019}
}