
Movie Reviews Challenge
Dataset for Movie Reviews Challenge
Movie Reviews Data
Summary
Introduction
This dataset is derived from the Large Movie Review Dataset published by the Association for Computational Linguistics. It provides labeled data for binary sentiment classification, containing movie reviews paired with their corresponding sentiment labels. The dataset is used in the Movie Review Challenge, enabling participants to develop and evaluate sentiment classification models.
Dataset Structure
Include
The dataset is distributed as two .zip archives containing the following files when extracted:
- train_1.csv:
- 17,484 labeled reviews for model training
- train_2.csv:
- Additional 17,484 labeled reviews for model training
- test.csv:
- 14,987 unlabeled reviews for prediction submission
- Note: Submissions are evaluated and ranked on the Public Leaderboard
An example from train.csv:
review_index,review,sentiment
0,'The film is so bad',negative
An example from test.csv:
review_index,review
0,'The film is so bad'
Data Fields
The training dataset contains the following fields:
- review_index: A unique identifier for each review.
- review: The text content of the movie review.
- sentiment: The sentiment label assigned to the review, either Positive or Negative. The test dataset includes the same fields, except it does not contain the sentiment label.
Reference
For more dataset information, please go through the following link, movie review dataset.
License
The dataset has been published by Association for Computational Linguistics.
Citation
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}