movie_reviews_data

Movie Reviews Challenge

Dataset for Movie Reviews Challenge

other
10M<n<100M
Text Classification
English
by @AIOZNetwork
•
1

Last updated: 23 days ago


Movie Reviews Data

Summary

Introduction

This dataset is derived from the Large Movie Review Dataset published by the Association for Computational Linguistics. It provides labeled data for binary sentiment classification, containing movie reviews paired with their corresponding sentiment labels. The dataset is used in the Movie Review Challenge, enabling participants to develop and evaluate sentiment classification models.

Dataset Structure

Include

The dataset is distributed as two .zip archives containing the following files when extracted:

  • train_1.csv:
    • 17,484 labeled reviews for model training
  • train_2.csv:
    • Additional 17,484 labeled reviews for model training
  • test.csv:
    • 14,987 unlabeled reviews for prediction submission
    • Note: Submissions are evaluated and ranked on the Public Leaderboard

An example from train.csv:

review_index,review,sentiment
0,'The film is so bad',negative

An example from test.csv:

review_index,review
0,'The film is so bad'

Data Fields

The training dataset contains the following fields:

  • review_index: A unique identifier for each review.
  • review: The text content of the movie review.
  • sentiment: The sentiment label assigned to the review, either Positive or Negative. The test dataset includes the same fields, except it does not contain the sentiment label.

Reference

For more dataset information, please go through the following link, movie review dataset.

License

The dataset has been published by Association for Computational Linguistics.

Citation

@InProceedings{maas-EtAl:2011:ACL-HLT2011,
  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
  title     = {Learning Word Vectors for Sentiment Analysis},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
  month     = {June},
  year      = {2011},
  address   = {Portland, Oregon, USA},
  publisher = {Association for Computational Linguistics},
  pages     = {142--150},
  url       = {http://www.aclweb.org/anthology/P11-1015}
}