Open Source AI Video Model Directory
Open Source AI Video Models
Curated directory of open source AI video generators, editors, and avatar models
29
Tracked Models
| Model | Description | Creator | Release date | GitHub / Repo | Paper / Docs |
|---|---|---|---|---|---|
| LongCat-Video | 13.6B foundational video generator with long-term coherence. Supports image/video/text input and continuation. | Meituan | Oct 28, 2025 | LongCat-Video | arXiv:2510.22200 |
| Ditto | Instruction-based video editing framework. Supports high-fidelity scene/subject/style edits using natural-language instructions, built on a large curated instruction dataset for video editing. | Ditto Team (EzioBy) | Oct 17, 2025 | github.com/EzioBy/Ditto | arxiv.org/abs/2510.15742 |
| FlashVSR | Real-time 4K video upscaling with diffusion. | OpenImagingLab | Oct 14, 2025 | FlashVSR | arXiv:2510.12747 |
| MoCha | End-to-end character replacement in video without keypoints. Requires only first-frame mask and reference. | Orange 3DV Team | Oct 2025 | MoCha | arXiv:2503.23307 |
| Ovi | Video + audio generation from text/image prompts. Twin diffusion backbone for video and audio. | Character AI | Sep 30, 2025 | Ovi | arXiv:2510.01284 |
| Wan-Alpha | High-quality text-to-video generation model supporting alpha-channel / transparent background outputs. Built on the Wan 2.1-T2V-14B backbone and LightX2V for fast inference and alpha compositing. | WeChatCV (WeChat CV Lab) | Sep 30 2025 (v1.0 release) | WeChatCV/Wan-Alpha | arXiv:2509.24979 |
| WanAnimate | Character animation & replacement model using video as reference. Integrated with Wan 2.2. | Alibaba Tongyi Lab | Sep 19, 2025 | Wan2.2 Animate | arXiv:2509.14055 |
| Lynx | High-fidelity personalized video generation model focused on identity preservation. Generates new videos of a specific person from one reference image using ID-adapters and Ref-adapters for facial detail control. | ByteDance | Sep 18, 2025 | github.com/bytedance/lynx | arxiv.org/abs/2509.15496 |
| Lucy Edit | Text-guided video editing model enabling object, style, character, and scene edits while preserving original motion. Built on Wan2.2-5B-based architecture with efficient edit-conditioning. | DecartAI | Sep 18, 2025 | github.com/DecartAI/Lucy-Edit-ComfyUI | Lucy Edit Paper |
| HuMo | Multimodal (text/image/audio) model for talking human videos with strong subject and lip-sync consistency. | ByteDance | Sep 10, 2025 | HuMo | arXiv:2509.08519 |
| Stand-In | Plug-and-play module for maintaining facial identity during video generation across scenes or styles. | Tencent WeChat CV Lab | Sep 2025 | Stand-In | arXiv:2508.07901 |
| InfiniteTalk | Audio-driven long-form talking-video generator. Produces image-to-video and video-to-video talking portraits with full-body, head, and lip synchronization; supports unlimited video length and sparse-frame generation. | MeiGen-AI | Aug 19, 2025 | github.com/MeiGen-AI/InfiniteTalk | arxiv.org/abs/2508.14033 |
| Wan 2.2 (14B) | Second-gen Wan model with Mixture-of-Experts. Enables cinematic 720p videos with better aesthetic and physical control. | Alibaba PAI / Tongyi Lab | Jul 29, 2025 | Wan2.2 | arXiv:2503.20314 |
| Wan 2.2 (5B) | Lightweight dense version of Wan 2.2 with a 3D-aware VAE. Can generate 5-sec 720p/24FPS video on a single high-end GPU. | Alibaba PAI / Tongyi Lab | Jul 29, 2025 | Wan2.2 | arXiv:2503.20314 |
| ReCamMaster | Novel-view video generation via camera trajectory input. Enables re-rendering videos with new motion. | Kuaishou & Zhejiang Univ | Jul 9, 2025 | ReCamMaster | arXiv:2503.11647 |
| FantasyPortrait | Multi-character animation with expression-level control. Synchronized expressions across faces. | Alibaba AMAP Lab | Jul 2025 | FantasyPortrait | arXiv:2507.12956 |
| EchoShot | Multi-shot video generation of same subject with coherent identity across shots. | Beihang Univ / D2I Lab | Jul 2025 | EchoShot | arXiv:2506.15838 |
| MTVCraft | Audio-video generation framework that splits text into sound streams and aligns visuals. | BAAI | Jun 2025 | MTVCraft | arXiv:2506.08003 |
| Phantom | Identity-preserving text+image to video framework. Integrates with Wan backbone and uses multi-subject memory. | ByteDance | May 27, 2025 | Phantom | arXiv:2502.11079 |
| ATI | Adds trajectory control to Wan models via a lightweight conditioning layer. | ByteDance | May 2025 | ATI | arXiv:2505.22944 |
| MiniMax-Remover | Object removal model trained with minimax optimization and distilled for fast inference. | Fudan Univ & Tencent | May 2025 | MiniMax-Remover | arXiv:2505.24873 |
| MultiTalk | Audio-driven multi-character video generation framework. Supports distinct voices and identity-mapped lipsync. | MeiGen-AI | May 2025 | MultiTalk | arXiv:2505.22647 |
| Hunyuan Avatar | Multi-character audio-driven avatar video generator. Supports emotion-aware speech animation, multi-speaker dialog videos, and realistic expression/motion using a multimodal diffusion transformer. | Tencent Hunyuan Lab | May 2025 | github.com/Tencent-Hunyuan/HunyuanVideo-Avatar | arxiv.org/abs/2505.20156 |
| Uni3C | 3D-enhanced model with simultaneous camera and human pose control for video generation. | Alibaba DAMO | Apr 2025 | Uni3C | arXiv:2504.14899 |
| FantasyTalking | Talking-head video generator using portrait + audio. Includes body/gesture motion and emotion control. | Alibaba AMAP Lab | Apr 2025 | FantasyTalking | arXiv:2504.04842 |
| SkyReels V2 | Infinite-length text/image-to-video model with autoregressive stitching and cinematic control features. | Skywork AI | Apr 2025 | SkyReels | arXiv:2504.13074 |
| VACE | Unified framework for video creation and editing. Combines motion control, style, object manipulation, and more into one architecture. | Alibaba DAMO / Tongyi Lab | Mar 2025 | VACE | arXiv:2503.07598 |
| Wan 2.1 | First open-source model in the Wan series. 14B/1.3B versions. Handles text-to-video/image generation with strong object motion, scene consistency, and bilingual prompt support. | Alibaba PAI / Tongyi Lab | Feb 27, 2025 | Wan2.1 | Wan Paper (arXiv) |
| LivePortrait | Efficient portrait animation framework that transforms a single still image into a lifelike video with head/eye/face motion, and supports stitching and retargeting control for high-quality output. | Kuaishou Technology (KwaiVGI) | Jul 4, 2024 (code release) | LivePortrait | arXiv:2407.03168 |
Ready to test?
Try Wan Animate and LivePortrait
Upload a character, record a driving video, and let our pipeline handle motion transfer, lip sync, and model deployment for you.
