Open Source AI Video Model Directory

Open Source AI Video Models

Curated directory of open source AI video generators, editors, and avatar models

43

Tracked Models

Open source video model directory sorted by release date (newest first)
ModelDescriptionTutorialCreatorRelease dateGitHub / RepoPaper / Docs
LTX-2DiT-based audio-video foundation model. Text to video and image to video. Capable of 4k resolution outputs. Wan 2.2 competitor.YouTube thumbnailLightricksJan 2026LTX-2LTX-2 Technical Report
StoryMemMulti-shot long video storytelling with memory. Built on Wan 2.2 14B.S-Lab, Nanyang Technological University & ByteDanceDec 22, 2025StoryMemarXiv:2512.19539
LongCat Video AvatarSpeech-driven talking avatar model built on the LongCat stack; turns a single portrait into a video avatar based on input audio and reference image.MeiGen-AIDec 16, 2025LongCat Video AvatarTech Report
Wan-MoveAdds point-level motion control to Wan 2.1 image-to-video model letting you drag or trace trajectories and see the video follow those traces. comparisons.Alibaba PAI / Tongyi LabDec 10, 2025Wan-MovearXiv:2512.08765
SCAILCreate animated characters with a control video. Good at dance videos with multiple characters. Does not have detailed face movements. Built on Wan 2.1.YouTube thumbnailZAIDec 5, 2025SCAILarXiv:2512.05905
LiveAvatarReal-time talking avatar model built on Wan 2.2 S2V. Turns a reference face plus audio into smooth streaming video tuned for low-latency playback.Alibaba QuarkDec 4, 2025LiveAvatararXiv:2512.04677
One-to-All-AnimationPortrait animation that improves upon aligning reference character and control video movement. Built on Wan 2.1.Jiangnan Univ; USTC; Chinese Academy of Sciences; BUPT; Zhejiang UnivNov 28, 2025One-to-All-AnimationarXiv:2511.22940
SteadyDancerModel that takes a reference image and control video and outputs a video with the character in the reference image moving like the person in the control video. Good for dance videos and other full body movement. Built on Wan 2.1.YouTube thumbnailMCG, Nanjing Univ. & Tencent PCGNov 24, 2025SteadyDancerarXiv:2511.19320
HunyuanVideo-1.5Lightweight 8.3B-parameter video generation model supporting text-to-video and image-to-video, using SSTA attention, 3D causal VAE, and 1080p super-resolution.Tencent PAI / Hunyuan LabNov 21, 2025HunyuanVideo-1.5Report
SAM 3Unified promptable segmentation model for images and videos: supports text/box/mask prompts, open-vocabulary concepts, and consistent video tracking.Meta AI / Facebook ResearchNov 19, 2025SAM3SAM 3 Paper
FFGOFirst Frame Is the Place to Go: lightweight video content customization that treats the first frame as a memory buffer for mixing multiple reference images. Go from image board to video containing reference images. Can be used as a LoRa with Wan 2.2.YouTube thumbnailUniv. of Maryland / USC / MITNov 19, 2025FFGO-Video-CustomizationarXiv:2511.15700
Time-to-Move (TTM)A simple add-on that helps image-to-video models create smoother, more natural motion. You guide the movement using easy inputs—like arrows, poses, or example motion—and TTM applies that motion while keeping the original image’s look consistent.A. Singer, N. Rotstein, A. Mann, R. Kimmel, O. LitanyNov 9, 2025TTMarXiv:2511.08633
LongCat-Video13.6B foundational video generator with long-term coherence. Supports image/video/text input and continuation.MeituanOct 28, 2025LongCat-VideoarXiv:2510.22200
DittoInstruction-based video editing framework. Supports high-fidelity scene/subject/style edits using natural-language instructions, built on a large curated instruction dataset for video editing.Ditto Team (EzioBy)Oct 17, 2025github.com/EzioBy/Dittoarxiv.org/abs/2510.15742
FlashVSRReal-time 4K video upscaling with diffusion.OpenImagingLabOct 14, 2025FlashVSRarXiv:2510.12747
MoChaEnd-to-end character replacement in video without keypoints. Requires only first-frame mask and reference.Orange 3DV TeamOct 2025MoChaarXiv:2503.23307
Krea-RealtimeReal-time generative video model and toolkit for live prompting and streaming outputs. Krea Realtime 14B is distilled from the Wan 2.1 14B text-to-video model using Self-Forcing, a technique for converting regular video diffusion models into autoregressive models.YouTube thumbnailKrea AIOct, 2025Krea-RealtimeKrea Realtime 14B Blog
OviVideo + audio generation from text/image prompts. Twin diffusion backbone for video and audio.YouTube thumbnailCharacter AISep 30, 2025OviarXiv:2510.01284
Wan-AlphaHigh-quality text-to-video generation model supporting alpha-channel / transparent background outputs. Built on the Wan 2.1-T2V-14B backbone and LightX2V for fast inference and alpha compositing.YouTube thumbnailWeChatCV (WeChat CV Lab)Sep 30 2025 (v1.0 release)WeChatCV/Wan-AlphaarXiv:2509.24979
WanAnimateCharacter animation & replacement model using video as reference. Integrated with Wan 2.2.YouTube thumbnailAlibaba Tongyi LabSep 19, 2025Wan2.2 AnimatearXiv:2509.14055
LynxHigh-fidelity personalized video generation model focused on identity preservation. Generates new videos of a specific person from one reference image using ID-adapters and Ref-adapters for facial detail control.ByteDanceSep 18, 2025github.com/bytedance/lynxarxiv.org/abs/2509.15496
Lucy EditText-guided video editing model enabling object, style, character, and scene edits while preserving original motion. Built on Wan2.2-5B-based architecture with efficient edit-conditioning.DecartAISep 18, 2025github.com/DecartAI/Lucy-Edit-ComfyUILucy Edit Paper
HuMoMultimodal (text/image/audio) model for talking human videos with strong subject and lip-sync consistency.ByteDanceSep 10, 2025HuMoarXiv:2509.08519
Stand-InPlug-and-play module for maintaining facial identity during video generation across scenes or styles.Tencent WeChat CV LabSep 2025Stand-InarXiv:2508.07901
InfiniteTalkAudio-driven long-form talking-video generator. Produces image-to-video and video-to-video talking portraits with full-body, head, and lip synchronization; supports unlimited video length and sparse-frame generation.YouTube thumbnailMeiGen-AIAug 19, 2025github.com/MeiGen-AI/InfiniteTalkarxiv.org/abs/2508.14033
Wan 2.2 (14B)Second-gen Wan model with Mixture-of-Experts. Enables cinematic 720p videos with better aesthetic and physical control.Alibaba PAI / Tongyi LabJul 29, 2025Wan2.2arXiv:2503.20314
Wan 2.2 (5B)Lightweight dense version of Wan 2.2 with a 3D-aware VAE. Can generate 5-sec 720p/24FPS video on a single high-end GPU.YouTube thumbnailAlibaba PAI / Tongyi LabJul 29, 2025Wan2.2arXiv:2503.20314
ReCamMasterNovel-view video generation via camera trajectory input. Enables re-rendering videos with new motion.Kuaishou & Zhejiang UnivJul 9, 2025ReCamMasterarXiv:2503.11647
FantasyPortraitMulti-character animation with expression-level control. Synchronized expressions across faces.YouTube thumbnailAlibaba AMAP LabJul 2025FantasyPortraitarXiv:2507.12956
EchoShotMulti-shot video generation of same subject with coherent identity across shots.Beihang Univ / D2I LabJul 2025EchoShotarXiv:2506.15838
MTVCraftAudio-video generation framework that splits text into sound streams and aligns visuals.BAAIJun 2025MTVCraftarXiv:2506.08003
PhantomIdentity-preserving text+image to video framework. Integrates with Wan backbone and uses multi-subject memory.ByteDanceMay 27, 2025PhantomarXiv:2502.11079
ATIAdds trajectory control to Wan models via a lightweight conditioning layer.ByteDanceMay 2025ATIarXiv:2505.22944
MiniMax-RemoverObject removal model trained with minimax optimization and distilled for fast inference.Fudan Univ & TencentMay 2025MiniMax-RemoverarXiv:2505.24873
MultiTalkAudio-driven multi-character video generation framework. Supports distinct voices and identity-mapped lipsync.YouTube thumbnailMeiGen-AIMay 2025MultiTalkarXiv:2505.22647
Hunyuan AvatarMulti-character audio-driven avatar video generator. Supports emotion-aware speech animation, multi-speaker dialog videos, and realistic expression/motion using a multimodal diffusion transformer.Tencent Hunyuan LabMay 2025github.com/Tencent-Hunyuan/HunyuanVideo-Avatararxiv.org/abs/2505.20156
Uni3C3D-enhanced model with simultaneous camera and human pose control for video generation.Alibaba DAMOApr 2025Uni3CarXiv:2504.14899
FantasyTalkingTalking-head video generator using portrait + audio. Includes body/gesture motion and emotion control.Alibaba AMAP LabApr 2025FantasyTalkingarXiv:2504.04842
SkyReels V2Infinite-length text/image-to-video model with autoregressive stitching and cinematic control features.Skywork AIApr 2025SkyReelsarXiv:2504.13074
VACEUnified framework for video creation and editing. Combines motion control, style, object manipulation, and more into one architecture.YouTube thumbnailAlibaba DAMO / Tongyi LabMar 2025VACEarXiv:2503.07598
Wan 2.1First open-source model in the Wan series. 14B/1.3B versions. Handles text-to-video/image generation with strong object motion, scene consistency, and bilingual prompt support.Alibaba PAI / Tongyi LabFeb 27, 2025Wan2.1Wan Paper (arXiv)
VEnhancerOpen-source video enhancer for sharpening and upscaling AI generated videos.YouTube thumbnailVchitectSept, 2024VEnhancerarXiv:2407.07667
LivePortraitEfficient portrait animation framework that transforms a single still image into a lifelike video with head/eye/face motion, and supports stitching and retargeting control for high-quality output.Kuaishou Technology (KwaiVGI)Jul 4, 2024 (code release)LivePortraitarXiv:2407.03168

Ready to test?

Try Wan Animate and LivePortrait

Upload a character, record a driving video, and let our pipeline handle motion transfer, lip sync, and model deployment for you.