VACE CONTROLNET simple workflow WAN2.1 | GGUF | LoRA | UPSCALE - v1.0

This workflow allows you to retrieve the movements of a video via controlnet (pose/canny or depth) and create a new video from an image of your choice with this movement.

Resources you need:

📂Files :

Recommendation :
>24 gb Vram: base or Q8_0
16 gb Vram: Q5_K_S
<12 gb Vram: Q4_K_S

For base version
VACE Model: wan2.1_vace_14B_fp8_e4m3fn.safetensors or wan2.1_vace_1.3B_fp16.safetensors
In models/diffusion_models

For GGUF version
VACE Quant Model: Wan2.1-VACE-14B-QX_0.gguf
In models/diffusion_models

CLIP: umt5_xxl_fp8_e4m3fn_scaled.safetensors
in models/clip

VAE: wan_2.1_vae.safetensors
in models/vae

LORA: Wan21_CausVid_14B_T2V_lora_rank32.safetensors
in models/lora

ANY upscale model:

Realistic : RealESRGAN_x4plus.pth
Anime : RealESRGAN_x4plus_anime_6B.pth

in models/upscale_models

📦Custom Nodes :

Description

base version

FAQ

Comments (42)

Peticree435May 29, 2025

CivitAI

great workflow as always. Which gguf would work on 24gb vram? Also, does vace work with split sigmas?

UmeAiRT

Author

May 29, 2025

I've done some testing and the split sigma isn't working at the moment. I've tried to publish a working version but I still have a lot of testing to do to optimize.

schschMay 29, 2025

CivitAI

8GB VRAM + 32GB RAM possible with Q4?

UmeAiRT

Author

May 29, 2025· 1 reaction

This will be complicated but you should try by loading as much of the model as possible into RAM.

Defect450May 29, 2025· 1 reaction

CivitAI

The legend returns! I can't wait to test this one out, thank you UmeAiRT!!

Peticree435Jun 2, 2025

literally though i wrote this. but agreed

SouthernLightsMay 29, 2025

CivitAI

Excellent work! I am delighted with this workflow and the excellent images it helps produce. Thank you for your hard work!

CupofTeaMay 29, 2025

CivitAI

I was waiting for this before I tried Vace. Works brilliantly and presented amazing as well. thanks!

tarajiyeon5201314927May 29, 2025· 1 reaction

CivitAI

Thank you very much. Can you give a good workflow for the recently released causvid and accvid acceleration lora? I have tried the ones released on the C site, but they are not easy to use. I look forward to your update.

UmeAiRT

Author

May 30, 2025

In this workflow I added the lora causvid, does this method not seem easy to use?

tarajiyeon5201314927May 30, 2025

@UmeAiRT Thanks, I'll try it

alain57160Jun 3, 2025

Hello @UmeAiRT I really love your workflow. Small question, did you tried something like in this one : https://civitai.com/models/1622023/causvid-2-sampler-workflow-for-wan-480p720p-i2v?modelVersionId=1835720

I read that causvid should have different values
unfortunately i'm not as expert as you in this domain
Maybe your could add such feature in your's different workflows

smolushaMay 30, 2025

CivitAI

Everything ends on a StringConcatenate note, the control preview works and shows the result. Then nothing happens.

Failed to validate prompt for output 398:

* StringConcatenate 511:

- Failed to convert an input value to a INT value: frame_c, , invalid literal for int() with base 10: ' '

- Failed to convert an input value to a INT value: frame_b, , invalid literal for int() with base 10: ''

- Required input is missing: text_a

- Required input is missing: text_b

- Failed to convert an input value to a INT value: frame_a, , invalid literal for int() with base 10: ''

Output will be ignored

Failed to validate prompt for output 413:

Output will be ignored

Prompt executed in 0.41 seconds

BBBAAA2May 30, 2025

I was having a similar problem with another one of Ume's (excellent) workflows and my workaround was to connect the positive prompt directly to the positive encode node. I think the problem is occurring when combining the Florence auto-prompt with the regular prompt. If you aren't using the auto-prompt anyways, connecting directly won't compromise anything else.

UmeAiRT

Author

May 30, 2025

This bug is related to different versions of ComfyUI that have changed the "concatenate" node. You must have the latest version of ComfyUI for this to work in general or right-click on it and then "fix node"

smolushaMay 30, 2025· 1 reaction

CivitAI

Here is another error in the console:

File "F:\AI\ComfyUI_windows_portable\ComfyUI\comfy\ldm\wan\model.py", line 244, in forward

c = self.before_proj(c) + x

~~~~~~~~~~~~~~~~~~~~^~~

RuntimeError: The size of tensor a (46620) must match the size of tensor b (47880) at non-singleton dimension 1

smolushaJun 1, 2025

An interesting fact is that it only works for me when setting the resolution to 480x480, but if you fix something on one parameter, the error I described above comes out.

p1042779030337Jun 2, 2025

CivitAI

Just to remember that clip too can be GGUF.

https://huggingface.co/city96/umt5-xxl-encoder-gguf/tree/main

williamkenji523Jun 3, 2025

CivitAI

Congratulations on your work, it's very good. I'm really enjoying it.

jay_richJun 4, 2025

CivitAI

Thanks again for your work! However the output video keeps on being just a brown dense fog? what settings am I missing here?
(using DWPose and the controlnet video renders fine from my input video. it is "only" the actual output video that does not show)

jay_richJun 4, 2025

now, the output is a yellow orange slow pace video of two womens faces!? Soo weird.. have tried all models and all apparent settings... doesnt change.

greedsmith353Aug 1, 2025

jay_rich i have a same thing

2600angroupAug 4, 2025

I also have the same problem

busyahnJun 6, 2025

CivitAI

The dynamic range drops a lot. The black part collapses a lot. What should I touch? Thank you always

BeyondMasterJun 6, 2025· 2 reactions

CivitAI

Excellent work!

Question: What setting can I change to keep a higher similiarity with the input iage and really to only get the movement from the video? The style changed too much for my taste. Thank you.

RobertBobertsonJul 17, 2025

Have the same question. I ran with the default settings (CFG 1.0, combination of DWPose, Depth and Canny enabled) and kind of got something close to my input image, but it was still a different style than I intended. When I switched to just using DWPose, which is what I use in the other popular non-VACE WAN ControlNet workflow to some success, it turned my cartoony source image into a real person in the output video lol.

If we had control over the ControlNet strength for video like we do for images, then I feel like that would allow us to get closer to the source image.

It's been about a month since your comment though. Were you able to figure this out, either with this workflow or a different one?

RobertBobertsonJul 17, 2025

So I just figured out that the CausVid_T2V lora the guide for this workflow suggests causes the realistic output I was getting. Without that, I get much closer to the original style but it could still be pushed much further.

I'm trying to mess with the strength value in the WanVaceToVideo node to see if that helps.

_VI_Jun 13, 2025

CivitAI

Always wanted to know what is better Base or GGUF version? What are the cardinal differences between them?

lost_moonJun 18, 2025· 3 reactions

Going by my knowledge: gguf is compressed and loads slower and loses some quality. base is larger, faster in the sense that it doesn't need to be decompressed in runtime, however it requires more vram and takes also somewhat longer to load into vram because its bigger. gguf might be slightly worse in everything in direct comparison. For image gguf I remember how Q8 was basically almost default, Q5 first with slight quality loss, Q4 still acceptable. But for example on a 10gb vram gpu one had to use Q4 as it was the only model below 10gb size. In therms of workflow, the basic one has no gguf load workflow nodes included, therefore if you use gguf quants you need the gguf wf.

_VI_Jun 18, 2025

@lost_moon Thanks for the detailed answer! I couldn't understand because using both methods I get the same result in processing time.

Psy_pmpJun 16, 2025

CivitAI

String Concatenate broken

jfgjrty55Jun 18, 2025

CivitAI

mxslider2d notes just look blank, if I do fix v2 they reappear for a second then disappear again. Anyone know how to fix this?

ApchXiJun 30, 2025

CivitAI

Hi! Please help me with this!!!!

KSampler

mat1 and mat2 shapes cannot be multiplied (154x768 and 4096x5120)

arandomuser2839Jul 4, 2025

For me that's usually needing to use a different text encoder. Try changing the CLIP model used.

ApchXiJul 4, 2025

@arandomuser2839 Thanks!!

spanking_cutieAug 2, 2025

CivitAI

10/10

tupuAug 4, 2025· 1 reaction

CivitAI

maybe it works well, but is not simple at all.

UmeAiRT

Author

Aug 4, 2025

Simple is the name of my workflow series, advance thing like vace can look complex but you just have to import 1 image and 1 video and all is automatic

GoldenCharactersAug 11, 2025

CivitAI

Really great workflow. How can I make the generated video keep the same style as my input image? I used the auto install script and left all the settings at default, but the output still changes a lot from the original image.

phdalAug 19, 2025· 1 reaction

CivitAI

No matter the seetings it is either an OOM or a 'The size of tensor a (60840) must match the size of tensor b (62010) at non-singleton dimension 1' error

Maxed out virtual memory, on a 24GB VRAM card

Essentially with Q8 the workflow is using upwards of 50gb or memory through some magical reason.

Anyone else having this?

zamaelDec 21, 2025

I had the same issue, but I could resolve the issue by using standard resolustion like 480p or 720p and it workd for me. (RTX 3090)

zamaelDec 21, 2025

CivitAI

It worked for me. If you have encountered an issue like "'The size of tensor a (####) must match the size of tensor b (####) at non-singleton dimension 1" at Ksampler, that means your latent image size could be an issue.

Try to use standard resolution such as 480P or 720P. I used the exact model/Lora/Clip/Vae that the OP listed, and ran this workflow with RTX 3090 (Base Version).

Thank you for creating such a simple and easy workflow, OP!

Workflows

Wan Video 14B i2v 720p

by UmeAiRT

Download (Beta) View on CivitAI