All Models
Decoupled DMD Distillation
Decoupled DMD reveals that Distribution Matching Distillation's success in few-step diffusion models comes from two distinct mechanisms: CFG Augmentation as the core engine for multi-to-few-step conversion, and Distribution Matching as a regularizer for training stability. This new understanding enables principled improvements through decoupled noise schedules, achieving state-of-the-art performance in efficient image generation and powering the top-tier 8-step Z-Image model.
Face Anti-Spoofing Challenge
Model for Face Anti-Spoofing Challenge
by @AIOZAI
Background Removal
Background Removal is an image processing technique, used to separate the main object from the background of a photo. Removing the background helps highlight the product, subject, or character, bringing a professional and aesthetically pleasing look to the image.
by @AIOZAI
SHARP Monocular View Synthesis
SHARP (Single-image High-Accuracy Real-time Parallax) is a photorealistic view synthesis model that generates high-quality 3D Gaussian representations from a single photograph in under one second. It supports real-time rendering of nearby views at over 100 FPS with metric camera movements, achieving state-of-the-art performance by reducing LPIPS by 25-34% and DISTS by 21-43% compared to prior models while being 1000× faster.
by @trassi-1990
LTX-2 Audiovisual Model
LTX-2 is an open-source audiovisual foundation model that generates high-quality, temporally synchronized video and audio content from text prompts. It features an asymmetric dual-stream transformer architecture (14B video + 5B audio parameters) with bidirectional cross-attention, achieving state-of-the-art quality among open-source systems while being 18× faster than comparable models and supporting up to 20 seconds of continuous generation.
Miro Thinker
MiroThinker is an open-source research agent that advances tool-augmented reasoning through model, context, and interactive scaling. It achieves state-of-the-art performance among open-source agents by enabling up to 600 tool calls per task within a 256K context window, demonstrating that interaction depth is a critical dimension for building next-generation research agents alongside model size and context length.
by @draven-2890
Image Super-Resolution with SMFANet
Image Super-Resolution with SMFANet involves utilizing the SMFANet model architecture to enhance the resolution and quality of images. SMFANet is a deep learning network designed for super-resolution tasks, aiming to generate high-quality, detailed images from low-resolution inputs.
by @AIOZAI
Low-light Image Enhancement
Low light Image Enhancement is a task focused on improving the quality and visibility of images captured in low-light conditions. This task involves applying image processing techniques and algorithms to enhance details, reduce noise, and increase brightness in photos taken in dimly lit environments.
by @AIOZAI
MediaPipe Face Detection
Face detection is a computer vision technique that involves identifying and locating human faces within an image or video. The goal of face detection is to detect the presence of faces, and draw bounding boxes around them, without necessarily identifying specific facial features or landmarks.
by @AIOZAI
MediaPipe Face Mesh Plotting
Face mesh detection, also known as facial landmark detection or face pose estimation, is the task of identifying and localizing specific keypoints or landmarks on a human face. It involves detecting the positions of facial features, such as eyes, eyebrows, nose, mouth, and jawline, in an image or video.
by @AIOZAI
Image Blending with Multiple Methods
Image Blending with Multiple Methods is a task that involves combining two or more images seamlessly to create a composite image using a variety of blending techniques. By leveraging multiple blending methods, such as alpha blending, gradient blending, or Laplacian pyramid blending, this task enables the merging of images while preserving the visual coherence and integrity of the final composition.
by @AIOZAI
Video to Canny Edge
Video to Canny Edge is the process of converting a video into a Canny edge representation, where edges in the video are emphasized and separated. Canny Edge is a popular algorithm in image processing and is often used to detect edges in images and videos.
by @AIOZAI
XFeat: Accelerated Features for Lightweight Image Matching
This is a task focused on enhancing the efficiency of image matching by leveraging lightweight yet highly discriminative features. XFeat employs optimized feature extraction techniques to identify key points and patterns in images, making it suitable for fast and accurate image comparison.
by @AIOZAI
DehazeFormer
DehazeFormer is a deep learning model designed for single image haze removal. Leveraging a transformer-based architecture, it effectively restores image clarity and contrast by removing haze, making it suitable for applications in autonomous driving, remote sensing, and image enhancement.
by @AIOZAI
Color Harmonization
Color Harmonization is a computational model designed to adjust and enhance the color balance of an image based on harmony templates, as proposed in the paper by Daniel Cohen-Or et al. The model improves visual aesthetics by aligning image colors with established color harmony principles. It supports multiple harmony templates and can be integrated with user interfaces for visual quality assessment using metrics from interfacemetrics.aalto.fi.
by @AIOZAI
Background Replacement
Background Replacement is a powerful tool that enables users to easily change the background of their images, opening up endless possibilities for creative transformations, and visual enhancements.
by @AIOZAI
Image Super-Resolution with SeemoRe
Image Super-Resolution with SeemoRe is a task aimed at improving the process of image super-resolution by leveraging expertise in the field. This task involves incorporating techniques that identify and utilize expert knowledge or specialized information to enhance the efficiency and accuracy of image upscaling.
by @AIOZAI
Color Extraction
Color Extraction is a task in computer vision that involves the extraction and analysis of colors from images or videos. The objective of this task is to identify and isolate specific colors, or color ranges present in the visual data.
by @AIOZAI
Text Generation by LiteLlama
We present an open-source reproduction of Meta AI's LLaMa 2. However, with significantly reduced model sizes, LiteLlama-460M-1T has 460M parameters trained with 1T tokens.
by @AIOZAI
Archer Image Generator
Archer Image Generator user Archer Diffusion, is a highly specialized Image generation AI Model of type Safetensors / Checkpoint AI Model created by AI community user civitai. Derived from the powerful Stable Diffusion (SD 1.5) model, Archer Diffusion has undergone an extensive fine-tuning process, leveraging the power of a dataset consisting of images generated by other AI models or user-contributed data. This fine-tuning process ensures that Archer Diffusion is capable of generating images that are highly relevant to the specific use-cases it was designed for, such as landscapes, nitrosocke, archer.
by @AIOZAI