Start
23/03/2026
Close
∞
Email Spam Classification Challenge
Building models to identify spam email
Challenge Rewards:
knowledgeParticipants
43
Submissions
19
Email Spam Classification Challenge
Overview
Email spam is one of the most common problems in digital communication, and detecting it is a classic challenge in machine learning. In this competition, your task is to build a model that can determine whether an email is spam or not based on its text content.
You'll work with a labeled dataset containing real-world email messages, giving you the opportunity to practice essential NLP skills such as text preprocessing, feature extraction, and classification. This challenge is beginner-friendly and designed to walk you through the full workflow of building a text-based machine learning model — from exploring the data to making accurate predictions.
Practice Skills
In this challenge, you will gain hands-on experience with:
- Python
- Natural Language Processing (NLP)
- Text preprocessing and feature engineering
- Classification algorithms
Evaluation
Goal
Train a classification model that predicts whether each email in the test set is spam or not spam.
Metric
Models will be evaluated using the Accuracy metric:
- Accuracy = Correct Predictions / Total Predictions
Submission Format
The submission file has two fields:
- email_index: The unique identifier for each email.
- label: The predicted label for the corresponding email (
0= not spam,1= spam).
| email_index | label |
|---|---|
| 123 | 0 |
| 124 | 1 |
The challenge includes two types of submissions:
Public Submission
- Use the public testing dataset (provided in the Data section) to generate your submission file.
Private Submission
- You must submit your model.
- It will be evaluated on a private testing dataset.
- Important: Your code must recursively scan the entire data directory (see the Code section for details).