Musical Instrument Classification

Summary

Introduction

Musical instrument classification is the task of categorizing musical instruments based on visual information extracted from images. This task involves developing models and algorithms that can automatically identify and classify different types of musical instruments solely based on their visual characteristics.

By analyzing the visual features, shapes, textures, and structural properties of musical instruments captured in images, the classification models aim to accurately recognize and differentiate between instruments such as piano, guitar, violin, drums, saxophone, and more. This task combines elements of computer vision, image processing, and machine learning to extract relevant visual features and train models capable of robust instrument classification.

In this task, there are common labels such as didgeridoo, tambourine, xylophone, acordian, alphorn, bagpipes, banjo, bongo drum, casaba, castanets, clarinet, clavichord, concertina, drums, dulcimer, flute, guiro, guitar, harmonica, harp, marakas, ocarina, piano, saxaphone, sitar, steel drum, trombone, trumpet, tuba, violin.

Parameters

Inputs

input - (image -.png|.jpg|.jpeg): The input of the model is an image of musical instruments.

Output

output - (text): Includes the model's predictions for each label, displayed as probabilities.

Examples

input	output

Usage for developers

Please find below the details to track the information and access the code for processing the model on our platform.

Requirements

torch
fastai
opencv-python

Code based on AIOZ structure

from fastai.vision.all import load_learner
import cv2, torch, os

...
def do_ai_task(
        context: Union[str, Path],
        answer:  Union[str, Path],
        model_storage_directory: Union[str, Path],
        device: Literal["cpu", "cuda", "gpu"] = "cpu",
        *args, **kwargs) -> Any:
        
        model_id = os.path.abspath(model_storage_directory + "...")
        categories = ('didgeridoo','tambourine', ...)
        learn = load_learner(model_id)
        if torch.cuda.is_available(): 
            learn.model.cuda()
        pred,idx,probs = learn.predict(cv2.imread(input))

        output = dict(zip(categories,map(float,probs)))
    return output

Reference

This repository is based on and inspired by Neeraj Handa's work. We sincerely appreciate their generosity in sharing the code.

License

We respect and comply with the terms of the author's license cited in the Reference section.