All Models

Search Models

all

AIOZ AI

Advanced Filters

MiroThinker is an open-source research agent that advances tool-augmented reasoning through model, context, and interactive scaling. It achieves state-of-the-art performance among open-source agents by enabling up to 600 tool calls per task within a 256K context window, demonstrating that interaction depth is a critical dimension for building next-generation research agents alongside model size and context length.

Other

PyTorch

ONNX

English

Chinese

by @draven-2890

130

Decoupled DMD Distillation

Decoupled DMD reveals that Distribution Matching Distillation's success in few-step diffusion models comes from two distinct mechanisms: CFG Augmentation as the core engine for multi-to-few-step conversion, and Distribution Matching as a regularizer for training stability. This new understanding enables principled improvements through decoupled noise schedules, achieving state-of-the-art performance in efficient image generation and powering the top-tier 8-step Z-Image model.

Apache-2.0

PyTorch

English

by @ankara-messi-6866

SHARP Monocular View Synthesis

SHARP (Single-image High-Accuracy Real-time Parallax) is a photorealistic view synthesis model that generates high-quality 3D Gaussian representations from a single photograph in under one second. It supports real-time rendering of nearby views at over 100 FPS with metric camera movements, achieving state-of-the-art performance by reducing LPIPS by 25-34% and DISTS by 21-43% compared to prior models while being 1000× faster.

MIT

PyTorch

English

by @trassi-1990

140

LTX-2 Audiovisual Model

LTX-2 is an open-source audiovisual foundation model that generates high-quality, temporally synchronized video and audio content from text prompts. It features an asymmetric dual-stream transformer architecture (14B video + 5B audio parameters) with bidirectional cross-attention, achieving state-of-the-art quality among open-source systems while being 18× faster than comparable models and supporting up to 20 seconds of continuous generation.

Apache-2.0

Text-to-Video

PyTorch

Transformers

Multilingual

by @jonecarries-1954

131

Whisper Large V3 - Fast-Speech-to-Text (AIOZ Optimized)

High-performance Whisper Large V3 model optimized for AIOZ inference. Provides fast, accurate speech-to-text transcription across 100+ languages. Ideal for API services, SaaS automation, real-time transcription, and scalable audio processing.

MIT

Automatic Speech Recognition

Transformers

PyTorch

Multilingual

by @benoitb-1991