Open Source AI Video Model Directory

Open Source AI Video Models

Curated directory of open source AI video generators, editors, and avatar models

Tracked Models

Open source video model directory sorted by release date (newest first)
Model	Description	Creator	Release date	GitHub / Repo	Paper / Docs
LongCat-Video	13.6B foundational video generator with long-term coherence. Supports image/video/text input and continuation.	Meituan	Oct 28, 2025	LongCat-Video	arXiv:2510.22200
Ditto	Instruction-based video editing framework. Supports high-fidelity scene/subject/style edits using natural-language instructions, built on a large curated instruction dataset for video editing.	Ditto Team (EzioBy)	Oct 17, 2025	github.com/EzioBy/Ditto	arxiv.org/abs/2510.15742
FlashVSR	Real-time 4K video upscaling with diffusion.	OpenImagingLab	Oct 14, 2025	FlashVSR	arXiv:2510.12747
MoCha	End-to-end character replacement in video without keypoints. Requires only first-frame mask and reference.	Orange 3DV Team	Oct 2025	MoCha	arXiv:2503.23307
Ovi	Video + audio generation from text/image prompts. Twin diffusion backbone for video and audio.	Character AI	Sep 30, 2025	Ovi	arXiv:2510.01284
Wan-Alpha	High-quality text-to-video generation model supporting alpha-channel / transparent background outputs. Built on the Wan 2.1-T2V-14B backbone and LightX2V for fast inference and alpha compositing.	WeChatCV (WeChat CV Lab)	Sep 30 2025 (v1.0 release)	WeChatCV/Wan-Alpha	arXiv:2509.24979
WanAnimate	Character animation & replacement model using video as reference. Integrated with Wan 2.2.	Alibaba Tongyi Lab	Sep 19, 2025	Wan2.2 Animate	arXiv:2509.14055
Lynx	High-fidelity personalized video generation model focused on identity preservation. Generates new videos of a specific person from one reference image using ID-adapters and Ref-adapters for facial detail control.	ByteDance	Sep 18, 2025	github.com/bytedance/lynx	arxiv.org/abs/2509.15496
Lucy Edit	Text-guided video editing model enabling object, style, character, and scene edits while preserving original motion. Built on Wan2.2-5B-based architecture with efficient edit-conditioning.	DecartAI	Sep 18, 2025	github.com/DecartAI/Lucy-Edit-ComfyUI	Lucy Edit Paper
HuMo	Multimodal (text/image/audio) model for talking human videos with strong subject and lip-sync consistency.	ByteDance	Sep 10, 2025	HuMo	arXiv:2509.08519
Stand-In	Plug-and-play module for maintaining facial identity during video generation across scenes or styles.	Tencent WeChat CV Lab	Sep 2025	Stand-In	arXiv:2508.07901
InfiniteTalk	Audio-driven long-form talking-video generator. Produces image-to-video and video-to-video talking portraits with full-body, head, and lip synchronization; supports unlimited video length and sparse-frame generation.	MeiGen-AI	Aug 19, 2025	github.com/MeiGen-AI/InfiniteTalk	arxiv.org/abs/2508.14033
Wan 2.2 (14B)	Second-gen Wan model with Mixture-of-Experts. Enables cinematic 720p videos with better aesthetic and physical control.	Alibaba PAI / Tongyi Lab	Jul 29, 2025	Wan2.2	arXiv:2503.20314
Wan 2.2 (5B)	Lightweight dense version of Wan 2.2 with a 3D-aware VAE. Can generate 5-sec 720p/24FPS video on a single high-end GPU.	Alibaba PAI / Tongyi Lab	Jul 29, 2025	Wan2.2	arXiv:2503.20314
ReCamMaster	Novel-view video generation via camera trajectory input. Enables re-rendering videos with new motion.	Kuaishou & Zhejiang Univ	Jul 9, 2025	ReCamMaster	arXiv:2503.11647
FantasyPortrait	Multi-character animation with expression-level control. Synchronized expressions across faces.	Alibaba AMAP Lab	Jul 2025	FantasyPortrait	arXiv:2507.12956
EchoShot	Multi-shot video generation of same subject with coherent identity across shots.	Beihang Univ / D2I Lab	Jul 2025	EchoShot	arXiv:2506.15838
MTVCraft	Audio-video generation framework that splits text into sound streams and aligns visuals.	BAAI	Jun 2025	MTVCraft	arXiv:2506.08003
Phantom	Identity-preserving text+image to video framework. Integrates with Wan backbone and uses multi-subject memory.	ByteDance	May 27, 2025	Phantom	arXiv:2502.11079
ATI	Adds trajectory control to Wan models via a lightweight conditioning layer.	ByteDance	May 2025	ATI	arXiv:2505.22944
MiniMax-Remover	Object removal model trained with minimax optimization and distilled for fast inference.	Fudan Univ & Tencent	May 2025	MiniMax-Remover	arXiv:2505.24873
MultiTalk	Audio-driven multi-character video generation framework. Supports distinct voices and identity-mapped lipsync.	MeiGen-AI	May 2025	MultiTalk	arXiv:2505.22647
Hunyuan Avatar	Multi-character audio-driven avatar video generator. Supports emotion-aware speech animation, multi-speaker dialog videos, and realistic expression/motion using a multimodal diffusion transformer.	Tencent Hunyuan Lab	May 2025	github.com/Tencent-Hunyuan/HunyuanVideo-Avatar	arxiv.org/abs/2505.20156
Uni3C	3D-enhanced model with simultaneous camera and human pose control for video generation.	Alibaba DAMO	Apr 2025	Uni3C	arXiv:2504.14899
FantasyTalking	Talking-head video generator using portrait + audio. Includes body/gesture motion and emotion control.	Alibaba AMAP Lab	Apr 2025	FantasyTalking	arXiv:2504.04842
SkyReels V2	Infinite-length text/image-to-video model with autoregressive stitching and cinematic control features.	Skywork AI	Apr 2025	SkyReels	arXiv:2504.13074
VACE	Unified framework for video creation and editing. Combines motion control, style, object manipulation, and more into one architecture.	Alibaba DAMO / Tongyi Lab	Mar 2025	VACE	arXiv:2503.07598
Wan 2.1	First open-source model in the Wan series. 14B/1.3B versions. Handles text-to-video/image generation with strong object motion, scene consistency, and bilingual prompt support.	Alibaba PAI / Tongyi Lab	Feb 27, 2025	Wan2.1	Wan Paper (arXiv)
LivePortrait	Efficient portrait animation framework that transforms a single still image into a lifelike video with head/eye/face motion, and supports stitching and retargeting control for high-quality output.	Kuaishou Technology (KwaiVGI)	Jul 4, 2024 (code release)	LivePortrait	arXiv:2407.03168

Ready to test?

Try Wan Animate and LivePortrait

Upload a character, record a driving video, and let our pipeline handle motion transfer, lip sync, and model deployment for you.

Try some models