All Models

Search
all
AIOZ AI
Miro-Thinker

Miro Thinker

MiroThinker is an open-source research agent that advances tool-augmented reasoning through model, context, and interactive scaling. It achieves state-of-the-art performance among open-source agents by enabling up to 600 tool calls per task within a 256K context window, demonstrating that interaction depth is a critical dimension for building next-generation research agents alongside model size and context length.

122
42
Decoupled-DMD-Distillation

Decoupled DMD Distillation

Decoupled DMD reveals that Distribution Matching Distillation's success in few-step diffusion models comes from two distinct mechanisms: CFG Augmentation as the core engine for multi-to-few-step conversion, and Distribution Matching as a regularizer for training stability. This new understanding enables principled improvements through decoupled noise schedules, achieving state-of-the-art performance in efficient image generation and powering the top-tier 8-step Z-Image model.

SHARP-Monocular-View-Synthesis

SHARP Monocular View Synthesis

SHARP (Single-image High-Accuracy Real-time Parallax) is a photorealistic view synthesis model that generates high-quality 3D Gaussian representations from a single photograph in under one second. It supports real-time rendering of nearby views at over 100 FPS with metric camera movements, achieving state-of-the-art performance by reducing LPIPS by 25-34% and DISTS by 21-43% compared to prior models while being 1000× faster.

user-avatar
129
35
LTX-2-Audiovisual-Model

LTX-2 Audiovisual Model

LTX-2 is an open-source audiovisual foundation model that generates high-quality, temporally synchronized video and audio content from text prompts. It features an asymmetric dual-stream transformer architecture (14B video + 5B audio parameters) with bidirectional cross-attention, achieving state-of-the-art quality among open-source systems while being 18× faster than comparable models and supporting up to 20 seconds of continuous generation.

Whisper-Large-V3

Whisper Large V3 - Fast-Speech-to-Text (AIOZ Optimized)

High-performance Whisper Large V3 model optimized for AIOZ inference. Provides fast, accurate speech-to-text transcription across 100+ languages. Ideal for API services, SaaS automation, real-time transcription, and scalable audio processing.

alien0000scaryhahaforreal

alien0000scaryhahaforreal

Face anti spoofing

Face_anti_spoofing

Face anti spoofing

Face_anti_spoofing_model

mobilenet

mobilenet

Face Anti Spoofing Challenge

Collections


Image-to-Text

Image-to-Text

The Image-to-Text task is an important task in the field of natural language processing and computer vision. Its purpose is to convert information within an image into readable and understandable text.

Object Detection

Object Detection

The Object Detection task is an important task in the fields of computer vision and artificial intelligence. Its main objective is to detect and determine the position of objects within images or videos.

Image to Image

Image to Image

Image-to-Image is an important task in the field of image processing, where we convert images from one format or data type to another.

Housing Prices Challenge

Housing Prices Challenge

House price prediction based on related factors such as house area, number of bedrooms, etc.

Movie Reviews Challenge

Movie Reviews Challenge

Classify positive and negative reviews

Face Anti-Spoofing Challenge

Face Anti-Spoofing Challenge

The Face Anti-spoofing Challenge aims to develop robust and accurate models that can distinguish between real human faces and various types of spoofing attempts.

Text to Image

Text to Image

Task Text-to-Image is an important task in the field of artificial intelligence and natural language processing. This task aims to create images from descriptions or descriptive text.