All Models
Miro Thinker
MiroThinker is an open-source research agent that advances tool-augmented reasoning through model, context, and interactive scaling. It achieves state-of-the-art performance among open-source agents by enabling up to 600 tool calls per task within a 256K context window, demonstrating that interaction depth is a critical dimension for building next-generation research agents alongside model size and context length.
by @draven-2890
Decoupled DMD Distillation
Decoupled DMD reveals that Distribution Matching Distillation's success in few-step diffusion models comes from two distinct mechanisms: CFG Augmentation as the core engine for multi-to-few-step conversion, and Distribution Matching as a regularizer for training stability. This new understanding enables principled improvements through decoupled noise schedules, achieving state-of-the-art performance in efficient image generation and powering the top-tier 8-step Z-Image model.
SHARP Monocular View Synthesis
SHARP (Single-image High-Accuracy Real-time Parallax) is a photorealistic view synthesis model that generates high-quality 3D Gaussian representations from a single photograph in under one second. It supports real-time rendering of nearby views at over 100 FPS with metric camera movements, achieving state-of-the-art performance by reducing LPIPS by 25-34% and DISTS by 21-43% compared to prior models while being 1000× faster.
by @trassi-1990
LTX-2 Audiovisual Model
LTX-2 is an open-source audiovisual foundation model that generates high-quality, temporally synchronized video and audio content from text prompts. It features an asymmetric dual-stream transformer architecture (14B video + 5B audio parameters) with bidirectional cross-attention, achieving state-of-the-art quality among open-source systems while being 18× faster than comparable models and supporting up to 20 seconds of continuous generation.
Whisper Large V3 - Fast-Speech-to-Text (AIOZ Optimized)
High-performance Whisper Large V3 model optimized for AIOZ inference. Provides fast, accurate speech-to-text transcription across 100+ languages. Ideal for API services, SaaS automation, real-time transcription, and scalable audio processing.