Open Source AI Video Model Directory
Open Source AI Video Models
Curated directory of open source AI video generators, editors, and avatar models
43
Tracked Models
| Model | Description | Tutorial | Creator | Release date | GitHub / Repo | Paper / Docs |
|---|---|---|---|---|---|---|
| LTX-2 | DiT-based audio-video foundation model. Text to video and image to video. Capable of 4k resolution outputs. Wan 2.2 competitor. | ![]() | Lightricks | Jan 2026 | LTX-2 | LTX-2 Technical Report |
| StoryMem | Multi-shot long video storytelling with memory. Built on Wan 2.2 14B. | — | S-Lab, Nanyang Technological University & ByteDance | Dec 22, 2025 | StoryMem | arXiv:2512.19539 |
| LongCat Video Avatar | Speech-driven talking avatar model built on the LongCat stack; turns a single portrait into a video avatar based on input audio and reference image. | — | MeiGen-AI | Dec 16, 2025 | LongCat Video Avatar | Tech Report |
| Wan-Move | Adds point-level motion control to Wan 2.1 image-to-video model letting you drag or trace trajectories and see the video follow those traces. comparisons. | — | Alibaba PAI / Tongyi Lab | Dec 10, 2025 | Wan-Move | arXiv:2512.08765 |
| SCAIL | Create animated characters with a control video. Good at dance videos with multiple characters. Does not have detailed face movements. Built on Wan 2.1. | ![]() | ZAI | Dec 5, 2025 | SCAIL | arXiv:2512.05905 |
| LiveAvatar | Real-time talking avatar model built on Wan 2.2 S2V. Turns a reference face plus audio into smooth streaming video tuned for low-latency playback. | — | Alibaba Quark | Dec 4, 2025 | LiveAvatar | arXiv:2512.04677 |
| One-to-All-Animation | Portrait animation that improves upon aligning reference character and control video movement. Built on Wan 2.1. | — | Jiangnan Univ; USTC; Chinese Academy of Sciences; BUPT; Zhejiang Univ | Nov 28, 2025 | One-to-All-Animation | arXiv:2511.22940 |
| SteadyDancer | Model that takes a reference image and control video and outputs a video with the character in the reference image moving like the person in the control video. Good for dance videos and other full body movement. Built on Wan 2.1. | ![]() | MCG, Nanjing Univ. & Tencent PCG | Nov 24, 2025 | SteadyDancer | arXiv:2511.19320 |
| HunyuanVideo-1.5 | Lightweight 8.3B-parameter video generation model supporting text-to-video and image-to-video, using SSTA attention, 3D causal VAE, and 1080p super-resolution. | — | Tencent PAI / Hunyuan Lab | Nov 21, 2025 | HunyuanVideo-1.5 | Report |
| SAM 3 | Unified promptable segmentation model for images and videos: supports text/box/mask prompts, open-vocabulary concepts, and consistent video tracking. | — | Meta AI / Facebook Research | Nov 19, 2025 | SAM3 | SAM 3 Paper |
| FFGO | First Frame Is the Place to Go: lightweight video content customization that treats the first frame as a memory buffer for mixing multiple reference images. Go from image board to video containing reference images. Can be used as a LoRa with Wan 2.2. | ![]() | Univ. of Maryland / USC / MIT | Nov 19, 2025 | FFGO-Video-Customization | arXiv:2511.15700 |
| Time-to-Move (TTM) | A simple add-on that helps image-to-video models create smoother, more natural motion. You guide the movement using easy inputs—like arrows, poses, or example motion—and TTM applies that motion while keeping the original image’s look consistent. | — | A. Singer, N. Rotstein, A. Mann, R. Kimmel, O. Litany | Nov 9, 2025 | TTM | arXiv:2511.08633 |
| LongCat-Video | 13.6B foundational video generator with long-term coherence. Supports image/video/text input and continuation. | — | Meituan | Oct 28, 2025 | LongCat-Video | arXiv:2510.22200 |
| Ditto | Instruction-based video editing framework. Supports high-fidelity scene/subject/style edits using natural-language instructions, built on a large curated instruction dataset for video editing. | — | Ditto Team (EzioBy) | Oct 17, 2025 | github.com/EzioBy/Ditto | arxiv.org/abs/2510.15742 |
| FlashVSR | Real-time 4K video upscaling with diffusion. | — | OpenImagingLab | Oct 14, 2025 | FlashVSR | arXiv:2510.12747 |
| MoCha | End-to-end character replacement in video without keypoints. Requires only first-frame mask and reference. | — | Orange 3DV Team | Oct 2025 | MoCha | arXiv:2503.23307 |
| Krea-Realtime | Real-time generative video model and toolkit for live prompting and streaming outputs. Krea Realtime 14B is distilled from the Wan 2.1 14B text-to-video model using Self-Forcing, a technique for converting regular video diffusion models into autoregressive models. | ![]() | Krea AI | Oct, 2025 | Krea-Realtime | Krea Realtime 14B Blog |
| Ovi | Video + audio generation from text/image prompts. Twin diffusion backbone for video and audio. | ![]() | Character AI | Sep 30, 2025 | Ovi | arXiv:2510.01284 |
| Wan-Alpha | High-quality text-to-video generation model supporting alpha-channel / transparent background outputs. Built on the Wan 2.1-T2V-14B backbone and LightX2V for fast inference and alpha compositing. | ![]() | WeChatCV (WeChat CV Lab) | Sep 30 2025 (v1.0 release) | WeChatCV/Wan-Alpha | arXiv:2509.24979 |
| WanAnimate | Character animation & replacement model using video as reference. Integrated with Wan 2.2. | ![]() | Alibaba Tongyi Lab | Sep 19, 2025 | Wan2.2 Animate | arXiv:2509.14055 |
| Lynx | High-fidelity personalized video generation model focused on identity preservation. Generates new videos of a specific person from one reference image using ID-adapters and Ref-adapters for facial detail control. | — | ByteDance | Sep 18, 2025 | github.com/bytedance/lynx | arxiv.org/abs/2509.15496 |
| Lucy Edit | Text-guided video editing model enabling object, style, character, and scene edits while preserving original motion. Built on Wan2.2-5B-based architecture with efficient edit-conditioning. | — | DecartAI | Sep 18, 2025 | github.com/DecartAI/Lucy-Edit-ComfyUI | Lucy Edit Paper |
| HuMo | Multimodal (text/image/audio) model for talking human videos with strong subject and lip-sync consistency. | — | ByteDance | Sep 10, 2025 | HuMo | arXiv:2509.08519 |
| Stand-In | Plug-and-play module for maintaining facial identity during video generation across scenes or styles. | — | Tencent WeChat CV Lab | Sep 2025 | Stand-In | arXiv:2508.07901 |
| InfiniteTalk | Audio-driven long-form talking-video generator. Produces image-to-video and video-to-video talking portraits with full-body, head, and lip synchronization; supports unlimited video length and sparse-frame generation. | ![]() | MeiGen-AI | Aug 19, 2025 | github.com/MeiGen-AI/InfiniteTalk | arxiv.org/abs/2508.14033 |
| Wan 2.2 (14B) | Second-gen Wan model with Mixture-of-Experts. Enables cinematic 720p videos with better aesthetic and physical control. | — | Alibaba PAI / Tongyi Lab | Jul 29, 2025 | Wan2.2 | arXiv:2503.20314 |
| Wan 2.2 (5B) | Lightweight dense version of Wan 2.2 with a 3D-aware VAE. Can generate 5-sec 720p/24FPS video on a single high-end GPU. | ![]() | Alibaba PAI / Tongyi Lab | Jul 29, 2025 | Wan2.2 | arXiv:2503.20314 |
| ReCamMaster | Novel-view video generation via camera trajectory input. Enables re-rendering videos with new motion. | — | Kuaishou & Zhejiang Univ | Jul 9, 2025 | ReCamMaster | arXiv:2503.11647 |
| FantasyPortrait | Multi-character animation with expression-level control. Synchronized expressions across faces. | ![]() | Alibaba AMAP Lab | Jul 2025 | FantasyPortrait | arXiv:2507.12956 |
| EchoShot | Multi-shot video generation of same subject with coherent identity across shots. | — | Beihang Univ / D2I Lab | Jul 2025 | EchoShot | arXiv:2506.15838 |
| MTVCraft | Audio-video generation framework that splits text into sound streams and aligns visuals. | — | BAAI | Jun 2025 | MTVCraft | arXiv:2506.08003 |
| Phantom | Identity-preserving text+image to video framework. Integrates with Wan backbone and uses multi-subject memory. | — | ByteDance | May 27, 2025 | Phantom | arXiv:2502.11079 |
| ATI | Adds trajectory control to Wan models via a lightweight conditioning layer. | — | ByteDance | May 2025 | ATI | arXiv:2505.22944 |
| MiniMax-Remover | Object removal model trained with minimax optimization and distilled for fast inference. | — | Fudan Univ & Tencent | May 2025 | MiniMax-Remover | arXiv:2505.24873 |
| MultiTalk | Audio-driven multi-character video generation framework. Supports distinct voices and identity-mapped lipsync. | ![]() | MeiGen-AI | May 2025 | MultiTalk | arXiv:2505.22647 |
| Hunyuan Avatar | Multi-character audio-driven avatar video generator. Supports emotion-aware speech animation, multi-speaker dialog videos, and realistic expression/motion using a multimodal diffusion transformer. | — | Tencent Hunyuan Lab | May 2025 | github.com/Tencent-Hunyuan/HunyuanVideo-Avatar | arxiv.org/abs/2505.20156 |
| Uni3C | 3D-enhanced model with simultaneous camera and human pose control for video generation. | — | Alibaba DAMO | Apr 2025 | Uni3C | arXiv:2504.14899 |
| FantasyTalking | Talking-head video generator using portrait + audio. Includes body/gesture motion and emotion control. | — | Alibaba AMAP Lab | Apr 2025 | FantasyTalking | arXiv:2504.04842 |
| SkyReels V2 | Infinite-length text/image-to-video model with autoregressive stitching and cinematic control features. | — | Skywork AI | Apr 2025 | SkyReels | arXiv:2504.13074 |
| VACE | Unified framework for video creation and editing. Combines motion control, style, object manipulation, and more into one architecture. | ![]() | Alibaba DAMO / Tongyi Lab | Mar 2025 | VACE | arXiv:2503.07598 |
| Wan 2.1 | First open-source model in the Wan series. 14B/1.3B versions. Handles text-to-video/image generation with strong object motion, scene consistency, and bilingual prompt support. | — | Alibaba PAI / Tongyi Lab | Feb 27, 2025 | Wan2.1 | Wan Paper (arXiv) |
| VEnhancer | Open-source video enhancer for sharpening and upscaling AI generated videos. | ![]() | Vchitect | Sept, 2024 | VEnhancer | arXiv:2407.07667 |
| LivePortrait | Efficient portrait animation framework that transforms a single still image into a lifelike video with head/eye/face motion, and supports stitching and retargeting control for high-quality output. | — | Kuaishou Technology (KwaiVGI) | Jul 4, 2024 (code release) | LivePortrait | arXiv:2407.03168 |
Ready to test?
Try Wan Animate and LivePortrait
Upload a character, record a driving video, and let our pipeline handle motion transfer, lip sync, and model deployment for you.














