WAN 2.1 IMAGE to VIDEO with Caption and Postprocessing

WAN 2.1 IMAGE to VIDEO with Caption and Postprocessing - Experimental

NSFW

Workflow: Image -> Autocaption (Prompt) -> WAN I2V with Upscale and Frame Interpolation and Video Extension

Creates Video Clips with up to 480p resoltion (720p with corresponding model)

There is a Florence Caption Version and a LTX Prompt Enhancer (LTXPE) version. LTXPE is more heavy on VRAM

LTX Prompt Enhancer (LTXPE) might have issues with latest Comfy and Lightricks update

MultiClip: Wan 2.1. I2V Version supporting Fusion X Lora to create clips with 8 steps and extend up to 3 times, see examples posted with 15-20sec of length.

Workflow will create a clip on Input Image and extends it with up to 3 clips/sequences. It uses a colormatch feature to ensure consistency in color and light in most cases. See the notes in worflow with full details.

There is a normal version which allows to use own prompts and a version using LTXPE for autoprompting. Normal version works well for specific or NSFW clips with Loras and the LTXPE is made to just drop an image, set width/height and hit run. The clips are combined to one full video at the end.

update 16th of July 2025: A new Lora "LightX2v"has been released as an alternative to Fusion X Lora. To use, switch Lora in black "Lora Loader" node. It can create great motion with only 4-6 steps. : https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/tree/main/loras

More info/tips & help: https://civarchive.com/models/1309065/wan-21-image-to-video-with-caption-and-postprocessing?dialog=commentThread&commentId=869306

V3.1: Wan 2.1. I2V Version supporting Fusion X Lora for fast processing

Fusion X Lora: process the video with just 8 Steps (or lower, see notes in workflow). It does not have the issues like the CausVid Lora from V3.0 and does not require a color match correction.

Fusion X Lora can be downloaded here: https://civarchive.com/models/1678575?modelVersionId=1900322 (i2V)

V3.0: Wan 2.1. I2V Version supporting Optimal Steps Scheduler (OSS) and CausVid Lora

OSS is a newer comfy core node to allow lower no. of steps with a boost in quality. Instead of using 50+ steps you can receive same result with like 24 steps. https://github.com/bebebe666/OptimalSteps
CausVid uses a Lora to process the video with just 8-10 steps, it is fast at a lower quality. It contains a Color Match option in postprocessing to cope with the increased saturation, the lora is introducing. Lora can be downloaded here: https://huggingface.co/Kijai/WanVideo_comfy/tree/main
(Wan21_CausVid_14B_T2V_lora_rank32.safetensors)
Both have a version with FLorence or LTX Prompt Enhancer (LTXPE) for Caption, can use Loras and have Teacache included.

V2.5: Wan 2.1. Image to Video with Lora Support and Skip Layer Guidance (improves motion)

There are 2 version, Standard with Teacache, Florence caption, upscale, frame interp. etc. plus a version with LTX Prompt Enhancer as an additional captioning tool (see notes for more info, requires custom nodes: https://github.com/Lightricks/ComfyUI-LTXVideo).

For Lora use, recommend to switch to own prompt with Lora trigger phrase, complex prompts might confuse some Loras.

V2.0: Wan 2.1. Image to Video with Teacache support for GGUF model, speeds up generation by 30-40%

It will render the first steps with normal speed, remaining steps with higher speed. There is a minor impact on quality with more complex motion. You can bypass the Teacache node with Strg-B

Example clips with workflow in Metadata: https://civarchive.com/posts/13777557

Info and help with Teacache: https://civarchive.com/models/1309065/wan-21-image-to-video-with-caption-and-postprocessing?dialog=commentThread&commentId=724665

V1.0: WAN 2.1. Image to Video with Florence caption or own prompt plus upscale, frame interpolation and clip extend.

Workflow is setup to use a GGUF model.

When generating a Clip you can chose to apply upscaling and/or frame interpolation. Upscale factor depends on upscale model used (2x or 4x, see "load upscale model" node). Frame Interpolation is set to increase frame rate from 16fps (model standard) to 32fps. Result will be shown in "Video Combine Final" node on the right, while the left node shows the unprocessed clip.

Recommend to "Toggle Link visibility" to hide the cables.

Models can be downloaded here:

Wan 2.1. I2V (480p): https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/tree/main

Clip (fp8): https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/text_encoders

Clip Vision: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision

VAE: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae

Wan 2.1. I2V (720p): https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf/tree/main

Wan2.1. Text to Video (works): https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main

location to save those files within your Comfyui folder:

Wan GGUF Model -> models/unet

Textencoder -> models/clip

Clipvision -> models/clip_vision

Vae -> models/vae

Tips:

lower framerate in "Video combine Final" node from 30 to 24 to have a slow motion effect
You can use the Text to Video GGUF Model, it will work as well.
If video output shows strange artifacts on the very right side of a frame, try changing the parameter "divisible_by" in node "Define Width and Height" from 8 to 16, this might better latch on to the standard Wan resolution and avoid the artifacts.
see this thread if you face issues with LTX Prompt Enhancer: https://civarchive.com/models/1823416?dialog=commentThread&commentId=955337
Last Frame: If you face issues finding the pack for that node: https://github.com/DoctorDiffusion/ComfyUI-MediaMixer

Full Video with Audio example:

Description

WAN 2.2. TI2V 5b GGUF Model support

FAQ

Comments (39)

FuSolikoMar 11, 2025

CivitAI

for NSFW do i need the original photo to contain nudity? If i prompt a photo for NSFW it just ignores and does something else non NSFW.
am i missing anything?

jm112368767Mar 11, 2025

Well, in general I2V is best for keeping the original content but just animating it. i.e., it's not great, nor really meant for, changing a lot of the base content. Recommend you use Flux Fill to change the source image to your NSFW liking then do I2V.

tremolo28

Author

Mar 11, 2025· 1 reaction

CivitAI

If LTX Prompt Encancer from experimental tab is causing issues (Error: "Expected all tensors..."), see below thread for solution, might occur with <16gb vram:

https://civitai.com/models/995093?modelVersionId=1511863&dialog=commentThread&commentId=727932

more infos: https://civitai.com/models/995093?modelVersionId=1511863&commentId=722660&dialog=commentThread

GFrostMar 11, 2025

CivitAI

Sometimes result like in slow motion. How can i fix this or is it normal behaviour?
P.S. Hope to see your version of TextToVideo

tremolo28

Author

Mar 12, 2025· 1 reaction

agree, sometimes the output looks like slomo. Did not try it, but assume the following could help: add text to negative prompt (i.e. "slow motion") or increase framerate in Final Video combine node from 30 to maybe 40.

Text to Video is next on my list :)

tremolo28

Author

Mar 13, 2025· 1 reaction

turns out, the workflow works as well as Text to Video by just using the T2V GGUF model: https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main

GFrostMar 13, 2025

@tremolo28 Ha! So what should i do? Just disable image input and use personal prompt?

tremolo28

Author

Mar 13, 2025

@GrandpaFrost just load the t2v video model and use own prompt or insert an image and let florence do the job.

GFrostMar 13, 2025

@tremolo28 Hmmm i have loaded, "wan2.1-t2v-14b-Q5_K_M.gguf"
and getting

"Unexpected architecture type in GGUF file, expected one of flux, sd1, sdxl, t5encoder but got 'pig'"

For example i use "wan2.1-i2v-14b-480p-Q4_K_M.gguf" for i2v

So which one i need to download?

tremolo28

Author

Mar 13, 2025

@GrandpaFrost yes, this model should work: "wan2.1-t2v-14b-Q5_K_M.gguf", I used Q4, but shouldnt matter. Anyway, you might need to have an input image in the workflow, even if you use an own prompt.

GFrostMar 14, 2025· 1 reaction

@tremolo28 with wan2.1-t2v-14b-Q4_K_M.gguf it worked.

GFrostMar 15, 2025

CivitAI

Im trying to add your Workflow via "resources" button but recently it stops to show there. However in some post i was able tu put it there. Did you encounter this and maybe have a solution?

tremolo28

Author

Mar 15, 2025· 1 reaction

I load/save workflows just from the default directory

CivitIsTooCensoredMar 15, 2025

CivitAI

Praying WAN comes to Stable. Comfy seems to require a copy of my checkpoints and LoRAs and I can't keep doubling 6.4GB.

gastonsandoval79889Mar 17, 2025

Modify the "extra_model_paths.yaml" file located into ComfyUI folder and point the respective models and loras path location to your Stable Diffusion models/loras folder. I also pointed the VAE, ControlNet models and such this way

Sniza_007522Mar 20, 2025· 1 reaction

CivitAI

I have my own setup that works without any problems. I tried yours and like other extended setups it says - mat1 and mat2 shapes cannot be multiplied (154x768 and 4096x5120). Can someone please advise me how to solve this problem? Thank you very much

Sniza_007522Mar 21, 2025· 1 reaction

Sorry, my fault, I don't use scaled CLIP model. But it help me more understanding system. Thank you very much for your work.

lecozizu385Jul 29, 2025

I have the same problem, can you help me, I'm a layman on the subject

GFrostMar 21, 2025· 1 reaction

CivitAI

Today i have an error:

DownloadAndLoadFlorence2Model

Unrecognized configuration class <class 'transformers_modules.Florence-2-large.configuration_florence2.Florence2LanguageConfig'> for this kind of AutoModel: AutoModelForCausalLM. Model type should be one of AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DiffLlamaConfig, ElectraConfig, Emu3Config, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, GitConfig, GlmConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeSharedConfig, HeliumConfig, JambaConfig, JetMoeConfig, LlamaConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, ZambaConfig, Zamba2Config.

Rlated post:

https://github.com/huggingface/transformers/issues/36886

tremolo28

Author

Mar 22, 2025

Cool, another update breaking custom nodes…. Anyway according to the linked thread, a fix seems to be in progress.

CaulShiversMar 22, 2025

any fix for this yet??

GFrostMar 22, 2025

@CaulShivers Apperently not i just turn noff nodes to bypass them. There is other problem. after recent updates video Generations taking longer again. Manager showing that my GPU all used but temp is not rising up as much as it was before.

GFrostMar 23, 2025

@tremolo28 Temporary FIX
https://github.com/kijai/ComfyUI-Florence2/issues/135

QuackquackMar 27, 2025· 1 reaction

I was able to get it working using the steps to replace "AutoModelForCausalLM" with "AutoModelForSeq2SeqLM" as mentioned in the below comment.
I also tried to downgraded transformers (python -m pip install --upgrade transformers==4.49.0) as they suggest but I don't believe that actually did anything for me (I think it failed to downgrade actually).
https://github.com/kijai/ComfyUI-Florence2/issues/134#issuecomment-2745372425

GFrostMar 27, 2025· 1 reaction

@Quackquack I use StabilityMatrix, so i used CMDline from "...\StabilityMatrix\Packages\ComfyUI\venv\Scripts" folder. it worked for me. Thnx for sharing man.

GFrostMar 30, 2025· 1 reaction

They fix it.

Lw24AIMar 22, 2025· 3 reactions

CivitAI

I love your workflows. I've been following your posts since the LTX models. Your workflows are always working, even though I only have 12GB of video memory.)))

tremolo28

Author

Mar 22, 2025· 1 reaction

thanks, mate

tremolo28

Author

Mar 26, 2025· 4 reactions

CivitAI

finally made my first Wan music video, It is me and the lads.

https://youtu.be/4oq2JOp5o5w?si=my6fxAUKeSddL_Lc

zczcgMar 27, 2025

CivitAI

Anyone can resolve it? Missing Node Types: WanImageToVideo, i update comyfui version,but it already can't solved

tremolo28

Author

Mar 27, 2025

WanImageToVideo is a comfy core node.. You might be on an outdated comfy version.

Current version (March 27th):

ComfyUI: v0.3.27-6-g3661c833
(2025-03-26)
Manager: V3.31.8

zczcgMar 28, 2025

@tremolo28 i have solved my problem,thanks a lot!

cbm27Mar 28, 2025

@zczcg how did you resove the problem - my manager can not find the missing nodes

zczcgMar 28, 2025· 1 reaction

@cbm27 You must update your comfyui,\ComfyUI_windows_portable\update\update_comfyui.bat here

EechiZeroMar 31, 2025· 2 reactions

CivitAI

It is very good. It would be great to add Loras and Seageattention

3rdny467Mar 31, 2025· 1 reaction

CivitAI

Great workflow, I also would like to know where/how to add loras?

tremolo28

Author

Mar 31, 2025· 1 reaction

might release an update this weekend for Lora support.

For now you could add a "LoraLoaderModelOnly" node and place it after unet loader and add the lora trigger word to Pre- or After text node. Just tried that with the "squish" Wan lora and it worked.

3rdny467Apr 2, 2025

Awesome Thank you! I tried something similar but it wasn't working. I'll give it a go. Do you think multiple loras would work?

tremolo28

Author

Apr 3, 2025· 1 reaction

@3rdny467 think it would work to daisy chain the Lora loaders, but might be an overkill to use 2+ Video Loras.

Workflows

Wan Video 14B i2v 480p

by tremolo28

Looks like we don't have an active mirror for this file right now.

CivArchive is a community-maintained index — we catalog mirrors that volunteers upload to HuggingFace, torrents, and other public hosts. Looks like no one has uploaded a copy of this file yet.

Some files do get recovered over time through contributions. If you're looking for this one, feel free to ask in Discord, or help preserve it if you have a copy.

Details

Downloads

240

Platform

CivitAI

Platform Status

Deleted

Created

3/11/2025

Updated

4/21/2026

Deleted

4/13/2026

Files

wan21IMAGEToVIDEOWith_experimental.zip

Size:

1.13 MB

SHA256:

776a2a5dc1cfd00624d927a82e9303af95d71af0b83aba0d2308cca4277df2c4

Mirrors

CivitAI (1 mirrors)

wan21IMAGEToVIDEOWith_experimental.zip

Size:

1.62 MB

SHA256:

fe9d35e478cbf7bf19db281a379c62a59a336a500c6fdc165fecea558848320d

Mirrors

CivitAI (1 mirrors)

wan21IMAGEToVIDEOWith_experimental.zip

Size:

756.76 KB

SHA256:

24aee4cfa329994eb21a18cef72ecb9c02880f36a6f88a32a3660f3121de5ce7

Mirrors

CivitAI (1 mirrors)

wan21IMAGEToVIDEOWith_experimental.zip

Description

FAQ

What is WAN 2.1 IMAGE to VIDEO with Caption and Postprocessing?

Why was this model removed from CivitAI?

What files are available and where can I download them?

Comments (39)

Details

Files

wan21IMAGEToVIDEOWith_experimental.zip

Mirrors

wan21IMAGEToVIDEOWith_experimental.zip

Mirrors

wan21IMAGEToVIDEOWith_experimental.zip

Mirrors