Start

23/03/2026

Close

Email Spam Classification Challenge

Building models to identify spam email

Challenge Rewards:

knowledge

Participants

43

Submissions

19

Email Spam Classification Model

License Python System

Challenge participants: Build robust email spam classification solutions.

Table of Contents

Quick Start

Download code baseline

# 1. Install dependencies (Do not add any other libraries to requirments.txt file)
pip install -r requirements.txt

# 2. Start developing your solution. Follow the tutorial below to implement your AI model

# 3. Run the demo to test your implementation
python demo.py

# 4. Verify Your Submission
python -m my_ai_lib.predict_submission

Note: You need to implement your email spam classification solutions following the tutorial guide.

Introduction

The Email Spam Classification Challenge is a beginner-friendly machine learning project where participants predict whether an email is spam or not based on its text content. This task is ideal for practicing text processing, feature extraction, and building simple classification models.

AIOZ Email Spam Classification Challenge

In this challenge, participants will be provided with:

  • Model Access: Email Spam Classification Model.

    • Code baseline to develop solutions.
    • Predefined libraries and tools in requirements.txt (Do not add any other libraries to requirments.txt file).
  • Dataset Access: Email Spam Classification Data.

    • Training dataset to train your AI models.
    • Testing dataset to predict labels for generating the submission file.

Goal: Classify emails as spam or not spam using their text content.

Requirements

System Requirements

  • Python 3.10+

Dependencies

Install all required packages:

pip install -r requirements.txt

Project Structure

Your AI library should follow this structure:

repository/
├── my_ai_lib/                   # Your AI library
│   ├── __init__.py              # Required: Library initialization
│   ├── run.py                   # Required: Main workflow function
│   ├── predict_submission.py    # Required: Submission function
│   └── [your_modules]/          # Your custom modules
├── models/                      # Model weights directory
├── demo.py                      # Demo script
├── requirements.txt             # Dependencies
└── README.md                    # Documentation

Key Components

ComponentDescriptionStatus
my_ai_lib/Core AI library directoryRequired
my_ai_lib/__init__.pyLibrary initializationRequired
my_ai_lib/predict_submission.pyGenerate predictions for challenge submission.Required
my_ai_lib/run.pyMain AI workflowRequired
demo.pyDemo and testing scriptRequired

Detailed Tutorial

Step 1: Initialize Your AI Library

1.1 Define my_ai_lib/__init__.py

from .run import run

This file defines that the run() function in run.py as an attribute of the my_ai_lib and that I can call it by my_ai_lib.run()

1.2 Define Input/Output Objects in my_ai_lib/run.py

Create your custom input and output classes:

from pathlib import Path
from typing import Any, Union, Literal
from aioz_ainode_adapter.schemas import InputObject, OutputObject, FileObject

class MyInput(InputObject):
    input_text: str
    
class MyOutput(OutputObject):
    text: str

Step 2: Understanding AIOZ Schema Objects

The aioz_ainode_adapter library defines 3 core object types based on pydantic.BaseModel:

🔸 InputObject

Define the format for input when the AIOZ-AI-Node system sends to your AI library.

Default Parameters:

ParameterTypeDescription
deviceChoiceDevice for your model: ["cuda", "cpu", "gpu"]
model_storage_directoryStringDirectory containing model weights

Important : Always use model_storage_directory for model weight paths, as AIOZ-AI-Node will specify this location.

🔸 OutputObject

Define the format for output when your AI library sends to the AIOZ-AI-Node system.

🔸 FileObject

Define the format for the file, if your output has a file. This object has two fields:

FieldTypeDescription
dataChoiceFile data: io.BufferedReader, Path, or URL
nameStringFile name

Example FileObject creation:

output_file = FileObject(data=open("file/path.csv", "rb"), name="output.csv")

Note:

  • Input files must be local file paths or URLs
  • Output files must be FileObject instances

Step 3: Implement the Main Workflow

3.1 Define Your AI Task Function

def do_ai_task(
        input_text: Union[str, Path],
        model_storage_directory: Union[str, Path],
        device: Literal["cpu", "cuda", "gpu"] = "cpu",
        *args, **kwargs) -> Any:
    """Define AI task: load model, pre-process, post-process, etc ..."""
    text = model.predict(input_text)  # input_text: email, text: label
    text = "0"
    return text

3.2 Implement the Required run() Function

def run(input_obj: InputObject) -> OutputObject:
    """
    Main entry point for your AI library.
    
    Args:
        input_obj: Input object containing all parameters
        
    Returns:
        OutputObject: Results of AI processing
    """
    try:
        # Validate and parse input
        my_input = MyInput.model_validate(input_obj.model_dump())
        print(f"Input: {my_input}")
        # Execute AI task
        text = do_ai_task(
            input_image=my_input.input_text,
            model_storage_directory=my_input.model_storage_directory,
            device=my_input.device
        )
        # Create output object
        output_obj = MyOutput(text=text)
    except Exception as e:
        raise Exception(e)

    return output_obj

Critical: The run() function name is mandatory and cannot be changed. The do_ai_task() function can be renamed and customized.

Step 4: Create Demo Script

Create demo.py to test your implementation:

import my_ai_lib
from aioz_ainode_adapter.schemas import InputObject

def main():
    """Demo function to test your AI library."""
    input_obj = InputObject(
        input_text=str(email)
    )
    output_obj = my_ai_lib.run(input_obj)
    print(f"Output: {output_obj}")


if __name__ == '__main__':
    main()

The my_ai_lib.run() function receives an InputObject and returns an OutputObject.

Run this command to test your implementation:

python demo.py

Expected console output:

Input: type='InputObj' device='cuda' model_storage_directory='models' input_image='wiki/aioz.png' example_param='example'
Output: type='OutputObj' text='This is the AI task result' output_image=FileObject(type='FileObj', data=<_io.BufferedReader name='wiki/aioz.png'>, name='output_image.png')

Step 5: Add Model Weights

Place your trained model files in the models/ directory:

models/
├── model.pth          # Your trained model
├── config.json        # Model configuration
└── etc.

Step 6: Create Prediction Script (For Submission)

Implement the predict_submission() function in my_ai_lib/predict_submission.py:

Requirements:

  • Function accepting test data folder path (string)
  • Load your trained model
  • Process test dataset (test.csv in test data folder)
  • Generate predictions
  • Save results as ./result.csv

Implementation Template:

from aioz_ainode_adapter.schemas import InputObject
import my_ai_lib

def predict_submission(test_data_folder: str):
    """
    Generate predictions for challenge submission.
    
    Args:
        test_data_folder: Path to test data directory
    """
    # Load your model
    # Process test data
    # Generate predictions
    # Save to ./result.csv

    # Example:
    # Find test data (test.csv) in test data folder (using os.walk)
    # Loop with each email:
    # - You define InputObject
    #     input_obj = InputObject(
    #     input_text=str(email) 
    # )
    # - Output (label is predicted from your model): output = my_ai_lib.run(input_obj)
    # - Write the predicted email spam (output.text) to result.csv
    pass

def main():
    """Main function for testing submission."""
    predict_submission("path/to/test/data")

if __name__ == '__main__':
    main()

Important: The result.csv must match the challenge's sample submission format.

Verify Your Submission:

python -m my_ai_lib.predict_submission

Submission Guidelines

Submission format

The submission file has two field:

  • email_index: The unique identifier for each email.
  • label: The predicted label for the corresponding email (0 = not spam, 1 = spam).

Example:

email_index, label
123, 0
124, 1

License

This repository is licensed under the MIT License.