All Models

Search Models

all

AIOZ AI

Advanced Filters

Iris Flower Classification Challenge

Model for Iris Flower Classification Challenge

MIT

Tabular Classification

PyTorch

Transformers

English

by @AIOZAI

Supertonic 2

Most TTS systems make you choose between speed, quality, and privacy. Supertonic 2 refuses that tradeoff — a featherweight 66M parameter model that runs 167× faster than real-time, entirely on your device, with zero network dependency. Powered by ONNX Runtime, it deploys across 11 platforms from iOS to Rust to the browser, supports 5 languages, and correctly reads complex real-world expressions that trip up every major cloud TTS API.

OpenRAIL

Text-to-Speech

PyTorch

ONNX

English

French

Spanish

Korean

Portuguese

by @AIOZAI

110

Qwen3-TTS

Qwen3-TTS is a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS supports state-of-the-art 3-second voice cloning and description-based control.

Apache-2.0

Text-to-Speech

PyTorch

Safetensors

English

by @AIOZAI

FireRed-Image-Edit-1.1

Edit anything in an image with a simple text instruction. FireRed-Image-Edit covers the full spectrum of image editing — from swapping backgrounds and retouching portraits to restoring old photos and performing virtual try-on across multiple images. With leading benchmark scores among all open-source models and bilingual Chinese–English instruction support, it's built for both researchers and real-world applications.

Apache-2.0

Image-to-Image

PyTorch

Safetensors

Diffusers

English

Chinese

by @AIOZAI

Helios-Distilled

Generate up to 60 seconds of high-quality video from a text prompt, a single image, or an existing video clip — all running at real-time speeds on a single H100. Helios-Distilled is the most efficient variant of the 14B Helios family, distilled to 3 inference steps while maintaining strong visual coherence and temporal consistency. No KV-cache, no quantization, no anti-drifting hacks — just fast, clean video generation out of the box.

Apache-2.0

Text-to-Video

PyTorch

Safetensors

Diffusers

English

by @AIOZAI

122

Email Spam Classification Challenge

Model for Email Spam Classification Challenge

MIT

Text Classification

Keras

English

by @AIOZAI

Pneumonia Chest X-Ray Classification Challenge

Model for Pneumonia Chest X-Ray Classification Challenge

MIT

Image Classification

Keras

English

by @AIOZAI

Decoupled DMD Distillation

Decoupled DMD reveals that Distribution Matching Distillation's success in few-step diffusion models comes from two distinct mechanisms: CFG Augmentation as the core engine for multi-to-few-step conversion, and Distribution Matching as a regularizer for training stability. This new understanding enables principled improvements through decoupled noise schedules, achieving state-of-the-art performance in efficient image generation and powering the top-tier 8-step Z-Image model.

Apache-2.0

PyTorch

English

by @ankara-messi-6866

Face Anti-Spoofing Challenge

Model for Face Anti-Spoofing Challenge

Other

Image Classification

English

by @AIOZAI

122

Background Removal

Background Removal is an image processing technique, used to separate the main object from the background of a photo. Removing the background helps highlight the product, subject, or character, bringing a professional and aesthetically pleasing look to the image.

Apache-2.0

Image-to-Image

PyTorch

English

by @AIOZAI

317

178

SHARP Monocular View Synthesis

SHARP (Single-image High-Accuracy Real-time Parallax) is a photorealistic view synthesis model that generates high-quality 3D Gaussian representations from a single photograph in under one second. It supports real-time rendering of nearby views at over 100 FPS with metric camera movements, achieving state-of-the-art performance by reducing LPIPS by 25-34% and DISTS by 21-43% compared to prior models while being 1000× faster.

MIT

PyTorch

English

by @trassi-1990

140

LTX-2 Audiovisual Model

LTX-2 is an open-source audiovisual foundation model that generates high-quality, temporally synchronized video and audio content from text prompts. It features an asymmetric dual-stream transformer architecture (14B video + 5B audio parameters) with bidirectional cross-attention, achieving state-of-the-art quality among open-source systems while being 18× faster than comparable models and supporting up to 20 seconds of continuous generation.

Apache-2.0

Text-to-Video

PyTorch

Transformers

Multilingual

by @jonecarries-1954

131

Miro Thinker

MiroThinker is an open-source research agent that advances tool-augmented reasoning through model, context, and interactive scaling. It achieves state-of-the-art performance among open-source agents by enabling up to 600 tool calls per task within a 256K context window, demonstrating that interaction depth is a critical dimension for building next-generation research agents alongside model size and context length.

Other

PyTorch

ONNX

English

Chinese

by @draven-2890

130

Image Super-Resolution with SMFANet

Image Super-Resolution with SMFANet involves utilizing the SMFANet model architecture to enhance the resolution and quality of images. SMFANet is a deep learning network designed for super-resolution tasks, aiming to generate high-quality, detailed images from low-resolution inputs.

Apache-2.0

Image-to-Image

PyTorch

English

by @AIOZAI

283

167

Low-light Image Enhancement

Low light Image Enhancement is a task focused on improving the quality and visibility of images captured in low-light conditions. This task involves applying image processing techniques and algorithms to enhance details, reduce noise, and increase brightness in photos taken in dimly lit environments.

Apache-2.0

Image-to-Image

PyTorch

English

by @AIOZAI

303

174

MediaPipe Face Detection

Face detection is a computer vision technique that involves identifying and locating human faces within an image or video. The goal of face detection is to detect the presence of faces, and draw bounding boxes around them, without necessarily identifying specific facial features or landmarks.

MIT

Object Detection

PyTorch

English

by @AIOZAI

274

167

MediaPipe Face Mesh Plotting

Face mesh detection, also known as facial landmark detection or face pose estimation, is the task of identifying and localizing specific keypoints or landmarks on a human face. It involves detecting the positions of facial features, such as eyes, eyebrows, nose, mouth, and jawline, in an image or video.

MIT

Object Detection

PyTorch

English

by @AIOZAI

295

168

Image Blending with Multiple Methods

Image Blending with Multiple Methods is a task that involves combining two or more images seamlessly to create a composite image using a variety of blending techniques. By leveraging multiple blending methods, such as alpha blending, gradient blending, or Laplacian pyramid blending, this task enables the merging of images while preserving the visual coherence and integrity of the final composition.

Apache-2.0

Image-to-Image

PyTorch

English

by @AIOZAI

303

168

Video to Canny Edge

Video to Canny Edge is the process of converting a video into a Canny edge representation, where edges in the video are emphasized and separated. Canny Edge is a popular algorithm in image processing and is often used to detect edges in images and videos.

MIT

Image-to-Image

PyTorch

English

by @AIOZAI

301

169

XFeat: Accelerated Features for Lightweight Image Matching

This is a task focused on enhancing the efficiency of image matching by leveraging lightweight yet highly discriminative features. XFeat employs optimized feature extraction techniques to identify key points and patterns in images, making it suitable for fast and accurate image comparison.

Apache-2.0

Image Feature Extraction

PyTorch

English

by @AIOZAI

202

170

3 4