LTX-2-Audiovisual-Model

LTX-2 Audiovisual Model

LTX-2 is an open-source audiovisual foundation model that generates high-quality, temporally synchronized video and audio content from text prompts. It features an asymmetric dual-stream transformer architecture (14B video + 5B audio parameters) with bidirectional cross-attention, achieving state-of-the-art quality among open-source systems while being 18× faster than comparable models and supporting up to 20 seconds of continuous generation.

Apache-2.0
Text-to-Video
PyTorch
Transformers
Multilingual
by @jonecarries-1954
121
0

Last updated: 5 months ago


Details
Files
Discussions
0

No discussions yet. Start the first one.

New Discussion