All Models

Search
all
AIOZ AI
License_Plate_Model

License Plate Recognition Challenge

Model for License Plate Recognition Challenge

user-avatar
1
0
z-image-turbo

Z-Image Turbo

Z-Image-Turbo is a 6B-parameter text-to-image diffusion model distilled from the Z-Image foundation model, producing high-fidelity images in only 8 NFEs (Number of Function Evaluations). It delivers sub-second inference latency on H800-class GPUs and fits within 16 GB of VRAM on consumer hardware, while preserving strong photorealism, bilingual (English/Chinese) text rendering, and reliable instruction following.

user-avatar
1
3
qwq-32b

QwQ-32B

QwQ-32B is a 32.5B-parameter causal reasoning model from the Qwen series, post-trained with supervised fine-tuning and reinforcement learning to think explicitly before answering. Despite its mid-range size, it delivers performance competitive with leading reasoning systems such as DeepSeek-R1 and o1-mini, particularly on hard math, coding, and multi-step problems. It supports a native 131,072-token context (with YaRN scaling for inputs beyond 8,192 tokens) and is best driven with non-greedy sampling (Temperature 0.6, TopP 0.95, TopK 20–40).

user-avatar
1
3
phi-4-instruct

Phi-4

Phi-4 is Microsoft's 14B-parameter dense decoder-only language model, trained on ~9.8T tokens of synthetic, textbook-quality, and curated web data with a 16K-token context window. It is engineered for reasoning and logic in memory- or latency-constrained deployments, matching or surpassing far larger models on math (MATH: 80.4) and science (GPQA: 56.1) benchmarks. Released under the permissive MIT license with SFT + DPO alignment for instruction following and safety.

user-avatar
1
11
qwen3-coder-next

Qwen3-Coder-Next

Qwen3-Coder-Next is an open-weight coding model from the Qwen team that activates just 3B of its 80B parameters per token, matching the performance of models 10–20× larger at a fraction of the inference cost. It pairs a hybrid Gated DeltaNet + Gated Attention MoE architecture with a native 262,144-token context, and is purpose-built for agentic coding tasks such as long-horizon reasoning, tool use, and failure recovery, with out-of-the-box support for Claude Code, Qwen Code, Qoder, Kilo, Trae, and Cline.

user-avatar
1
11
Spaceship_Titanic_Model

Spaceship Titanic Prediction Challenge

Model for Spaceship Titanic Prediction Challenge

user-avatar
0
12
heartmula

HeartMuLa

Most open-source music models give you one capability. HeartMuLa gives you four: a lyrics-conditioned song generator, a high-fidelity music codec, a lyrics transcription model, and an audio-text alignment model — all open-sourced together as a coherent foundation. The 3B generator handles multilingual lyrics across English, Chinese, Japanese, Korean, and Spanish, with style controlled through simple comma-separated tags. An internal 7B version already reaches Suno-level quality, with the open 7B release planned.

user-avatar
2
12
ui_venus_1_5

UI-Venus 1.5

Give UI-Venus 1.5 a natural language instruction and a screenshot — it will find the right button, navigate the interface, and complete the task, just like a human would. No accessibility APIs, no DOM parsing, no special permissions needed. The unified 2B/8B/30B-A3B model family achieves state-of-the-art results on major GUI benchmarks including AndroidWorld (77.6%) and ScreenSpot-Pro (69.6%), with a full Android automation framework supporting 40+ mainstream apps out of the box.

user-avatar
1
12
Text_Generation_with_SmolLM-135M

Text Generation with SmolLM-135M

Text Generation with SmolLM-135M involves utilizing a compact language model with 135 million parameters to automatically generate text. This model, although smaller in size, is proficient at producing coherent and structured textual content.

user-avatar
128
151
Melanoma_Skin_Cancer_Model

Melanoma Skin Cancer Classification Challenge

Model for Melanoma Skin Cancer Classification Challenge

user-avatar
19
20
deepseek_ocr_coc

DeepSeek-OCR

DeepSeek-OCR reimagines optical character recognition as a context compression problem — treating visual documents not as images to scan, but as information to compress and decode through an LLM-centric vision encoder. It converts documents, PDFs, and images to clean markdown, extracts text with layout awareness, parses figures, and localizes specific elements by reference — all at ~2500 tokens per second on a single A100 with vLLM. Multiple resolution modes from 64 to 400+ vision tokens let you tune the quality-speed tradeoff for your use case.

user-avatar
27
35
lightrag

LightRAG

LightRAG is a simple, fast, and powerful RAG system that goes beyond chunk retrieval by automatically building a knowledge graph from your documents — then querying both the graph and vector store simultaneously for richer, more contextually aware answers. Published at EMNLP 2025 and trusted by 29k+ developers, it works with any LLM, supports production-grade storage backends, and ships with a Web UI featuring live knowledge graph visualization.

user-avatar
22
38
omnilottie

OmniLottie

OmniLottie is the first end-to-end model capable of generating Lottie animations directly from text descriptions, images, or video clips — producing structured, editable JSON output rather than raster video. Built on a 4B vision-language model and trained on MMLottie-2M, a dataset of 2 million annotated animations, it introduces a custom Lottie tokenizer that makes complex vector animation learnable by a language model. Accepted to CVPR 2026.

user-avatar
19
54
Iris_Flower_Model

Iris Flower Classification Challenge

Model for Iris Flower Classification Challenge

user-avatar
38
66
supertonic_2

Supertonic 2

Most TTS systems make you choose between speed, quality, and privacy. Supertonic 2 refuses that tradeoff — a featherweight 66M parameter model that runs 167× faster than real-time, entirely on your device, with zero network dependency. Powered by ONNX Runtime, it deploys across 11 platforms from iOS to Rust to the browser, supports 5 languages, and correctly reads complex real-world expressions that trip up every major cloud TTS API.

user-avatar
74
58
qwen3_tts

Qwen3-TTS

Qwen3-TTS is a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS supports state-of-the-art 3-second voice cloning and description-based control.

user-avatar
64
72
firered_image_edit

FireRed-Image-Edit-1.1

Edit anything in an image with a simple text instruction. FireRed-Image-Edit covers the full spectrum of image editing — from swapping backgrounds and retouching portraits to restoring old photos and performing virtual try-on across multiple images. With leading benchmark scores among all open-source models and bilingual Chinese–English instruction support, it's built for both researchers and real-world applications.

user-avatar
47
61
helios_14b_realtime

Helios-Distilled

Generate up to 60 seconds of high-quality video from a text prompt, a single image, or an existing video clip — all running at real-time speeds on a single H100. Helios-Distilled is the most efficient variant of the 14B Helios family, distilled to 3 inference steps while maintaining strong visual coherence and temporal consistency. No KV-cache, no quantization, no anti-drifting hacks — just fast, clean video generation out of the box.

user-avatar
90
65
Email_Spam_Model

Email Spam Classification Challenge

Model for Email Spam Classification Challenge

user-avatar
58
49
Pneumonia_Chest_X_Ray_Model

Pneumonia Chest X-Ray Classification Challenge

Model for Pneumonia Chest X-Ray Classification Challenge

user-avatar
58
62