Start

23/03/2026

Close

Email Spam Classification Challenge

Building models to identify spam email

Challenge Rewards:

knowledge

Participants

43

Submissions

19

Email Spam Classification Challenge

Overview

Email spam is one of the most common problems in digital communication, and detecting it is a classic challenge in machine learning. In this competition, your task is to build a model that can determine whether an email is spam or not based on its text content.

You'll work with a labeled dataset containing real-world email messages, giving you the opportunity to practice essential NLP skills such as text preprocessing, feature extraction, and classification. This challenge is beginner-friendly and designed to walk you through the full workflow of building a text-based machine learning model — from exploring the data to making accurate predictions.

Practice Skills

In this challenge, you will gain hands-on experience with:

  • Python
  • Natural Language Processing (NLP)
  • Text preprocessing and feature engineering
  • Classification algorithms

Evaluation

Goal

Train a classification model that predicts whether each email in the test set is spam or not spam.

Metric

Models will be evaluated using the Accuracy metric:

  • Accuracy = Correct Predictions / Total Predictions

Submission Format

Submission guide

The submission file has two fields:

  • email_index: The unique identifier for each email.
  • label: The predicted label for the corresponding email (0 = not spam, 1 = spam).
email_indexlabel
1230
1241

The challenge includes two types of submissions:

Public Submission

  • Use the public testing dataset (provided in the Data section) to generate your submission file.

Private Submission

  • You must submit your model.
  • It will be evaluated on a private testing dataset.
  • Important: Your code must recursively scan the entire data directory (see the Code section for details).