Email Spam Classification Challenge

Overview

Email spam is one of the most common problems in digital communication, and detecting it is a classic challenge in machine learning. In this competition, your task is to build a model that can determine whether an email is spam or not based on its text content.

You'll work with a labeled dataset containing real-world email messages, giving you the opportunity to practice essential NLP skills such as text preprocessing, feature extraction, and classification. This challenge is beginner-friendly and designed to walk you through the full workflow of building a text-based machine learning model — from exploring the data to making accurate predictions.

Practice Skills

In this challenge, you will gain hands-on experience with:

Python
Natural Language Processing (NLP)
Text preprocessing and feature engineering
Classification algorithms

Evaluation

Goal

Train a classification model that predicts whether each email in the test set is spam or not spam.

Metric

Models will be evaluated using the Accuracy metric:

Accuracy = Correct Predictions / Total Predictions

Submission Format

Submission guide

The submission file has two fields:

email_index: The unique identifier for each email.
label: The predicted label for the corresponding email (0 = not spam, 1 = spam).

email_index	label
123	0
124	1

The challenge includes two types of submissions:

Public Submission

Use the public testing dataset (provided in the Data section) to generate your submission file.

Private Submission

You must submit your model.
It will be evaluated on a private testing dataset.
Important: Your code must recursively scan the entire data directory (see the Code section for details).

All Challenges

Email Spam Classification Challenge

Email Spam Classification Challenge

Overview

Practice Skills

Evaluation

Goal

Metric

Submission Format

Public Submission

Private Submission

AIOZAI

Email Spam Classification Challenge