Wan 2.2 Video + Sound workflow optimized for RTX 3060 12 GB VRAM GPU

Wan 2.2 Video + Sound workflow optimized for RTX 3060 12 GB VRAM GPU - v1.1 (no SageAttention)

NSFW

[Edit:

Version v5.0 works with latest comfyui (v0.15.0).

If you have any problems, please refer to the FAQ at the bottom of the page or have a look in the comments.

Many thanks to everyone who tested this workflow. Thank you very much for the many inquiries and, of course, for all the knowledge and experience you have contributed. here👍🙂

Special thanks to:

@SeoulSeeker for the "Dead Simple MMAudio" workflow wich are the basis of the audio part here,

@taek75799 for the really well working enhanced models

@Bakazaya pointing to the color issue in version v3.0 and running lots of tests,

@bluntfeather sharing latest experiances with installing Comfyui-Easy-Install,

@nitrovtx for remain persistent in matters of quality and running a lot of tests,

@Icey64 for providing the link to "Comfyui-Easy Install",

@boinobin730 for asking for a First to Last Frame option, running pre tests and responding fast as hell 🙂 and

@SnowShoes311 thank you so much again for all your buzzing 😋]

Features:

Optimized Wan 2.2 workflow, runs perfect on RTX 3060 12 GB VRAM GPU and 32 GB RAM,
"Text to Video", "Image to Video" and "First/Last Frame 2 Video" generation in one workflow, all with easy audio generation,
easy installation/model downloading, all necessary sources are specified,
easy to use workflow, clearly structured, all necessary steps are explained,
easy switches for mode selection,
easy prompt selection for fast prompt creation/testing,
easy switching between "standard" and "enhanced" models,
very fast and smoth high quality outputs up to aprox. 1440 x 960 with 60fps,
2x fast upscaler,
4x fast framerate multiplier,
MMAudio Sampler (generates sound accordingly to the video action),
Triton and Sage Attention option,
A 5 Second long high quality video generation takes about 10 - 15 minutes (see below).

Tested generation times:

As a rough guide value for RTX 3060 GPU: generating a 5 second long high quality 1440 x 960 60 fps video with 6 steps it will take:

t2v: around 10 - 12 minutes,
i2v: around 15 minutes.

Comfyui-Easy-Install with Triton + SageAttention:

This workflow should work with any latest comfyui version >v0.6.0 (Desktop, Embedded, Windows/Linux).

However, comfyui is developing rapidly, and it often happens that some of the custom nodes used are not updated quickly enough or not updated at all. Manual workarounds are sometimes necessary. Furthermore, care must be taken to ensure that there are no conflicts with other nodes.

If you're having difficulties with your existing comfyui system or if you want to run video generation on a separate (parallel) comfyui system, like I do, I would recommend you the following installer: https://github.com/Tavris1/ComfyUI-Easy-Install.

Complete installation of comfyui including manager and some pre configured custom nodes is just one click - really 🙂
Installation of Triton + SageAttention is just a second click - really 🙂 And since it's so easy now, I would definitely recommend it to you for video generation.
Cause it is an embedded version, you can install it parallel to your existing comfyui version without the risk to ruin your working system.
After installation just configure the "extra_model_paths.yaml" file to use your existing models.
After a fresh installation of Comfyui-Easy-Install you might have some issues too, but there are known workarounds - please see the FAQ below.

For testing/understanding/experimenting/changing the workflow:

Click "Toggle Link Visibility" to see the links.
click the Subgraph symbols to open the Subgraphs.
for quick testing you may lower the settings for: steps, clip lenght and video resolution,
be really carefull with modifying Groups or Subgroups (even Titel or Color) cause they are essential for switching,
feel free to try and test other models. Just give me a hint if you find models which deliver better results and fitting the 12 GB VRAM limit.

And as usual: Have Fun 🙂🙂

Short Conclusion:

This workflow is based on elements of a variety of allready published workflows. My "job" was only to put things together, optimize it for a small machine and create a most simple and hopfully user or even "beginner" friendly workflow.

I`m not an "expert" - just a user who wants to get it running on "available" hardware.

There are many things I don't really understand. If you find mistakes or better solutions please give me a hint.

And I really hope that even "beginners" have a chance to go the first steps...

Frequently Asked Questions (FAQ):

For quick and better overview I will try to merge all known issues here - step by step (please be patiant). If your issue is not listed here, please have a look in the comments first. Most issues have been allready discussed.

Comfyui Nodes 2.0:

Turn off Nodes 2.0 in comfyui (use comfyui menue). Actually not all custom nodes are supported.

Comfyui crashes after generation while vae decode, upscaling or frame rate multiplying (Rife VFI) without any error report:

This is a RAM problem (not VRAM). Increase your swap file (min. 64 to 128 GB) or set it to automatic management on a fast drive with at least 100 GB free space.

JW Nodes (JWFloatToInteger, JWIntergerDiv, JWImageResizeByLongerSide), soundfile missing:

For the workaround look here and here:

python -m pip install soundfile

Fresh Comfyui-Easy_Install Installation (missing soundfile and Pytorch v2.9.0 issue with SageAttention on Windows:

For full conversation look here.

Open cmd in python_embedded folder:

python -m pip install soundfile

python -m pip uninstall -y torch torchvision torchaudio

python -m pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu126

Slider Nodes - how can I modify the "default" values:

Right click the slider node, choose Properties and set the values you like 🙂🙃

Description

Attention:

Please use this version if you have not installed Triton and Sage Attention.

(Please see my desription about Sage Attention).

Two minor bugs have been fixed in version 1.1:

1. KSampler set from fixed to random.

2. ‘03 Image 2 Video’: missing connections to generate the correct resolution set.

If you have already corrected this in your workflow, there is no need to download this update.

FAQ

Comments (20)

andreas456Aug 16, 2025· 4 reactions

CivitAI

10/10. Ive got a 5090, what do you recommend to adjust to speed that up ?

arkinson

Author

Aug 16, 2025

I have no experiance with RTX 5090, but I would say you're playing in a completely different league 😅🤣

andreas456Aug 17, 2025· 1 reaction

arkinson haha, all just a big tax write off. il have to start sussing out books for the full picture / process

arkinson

Author

Aug 17, 2025

andreas456 "big tax write off" - oh my, I like it 😅 Do you have experiances with video generation in comfyui? What I wanted to say is that my workflow is aimed at users with ‘small’ hardware. If you have a 32 GB VRAM GPU you may use other workflows that better suits to your hardware.

mschreiner1988422Aug 16, 2025· 3 reactions

CivitAI

one of the best Workflows. Amazing

arkinson

Author

Aug 16, 2025

Thank you - I am glad it is usefull 🙂

ZealotAug 18, 2025· 1 reaction

CivitAI

KSamplerAdvanced

Given groups=1, weight of size [5120, 36, 1, 2, 2], expected input[1, 32, 21, 60, 46] to have 36 channels, but got 32 channels instead

arkinson

Author

Aug 18, 2025

Look her for example: wan2.2 · Issue #9092 · comfyanonymous/ComfyUI

I would bet you did not yuse the right models. VAE 2.1??

ZealotAug 18, 2025· 1 reaction

arkinson solved, update the comfyui version

SDGFDGFHFd46346Aug 18, 2025· 4 reactions

CivitAI

I downloaded all the correct models. I download the workflow, load it into comfy, run the process, it finishes rather fast, but it looks like garbage. Like... really washed out and you can just barely recognize that there is a cat in a basket. Any idea what might cause this?

Light6969Aug 19, 2025· 1 reaction

Are you loading the right light loras. Sometimes it looks like it's loaded but you have to select the lora yourself to be sure.

arkinson

Author

Aug 19, 2025

Yes, I would recommend you the same as @Light2020462. Please check every loader node and select the model from your local path.

dayeliu950672Aug 19, 2025· 2 reactions

CivitAI

我想知道在哪里设置视频的时长，没有找到哪个节点可以设置视频时长，我这里都是生成1秒的视频

arkinson

Author

Aug 19, 2025

It`s super easy. Just use the slider nodes "clip length (in seconds)". You find them in the "03 Text 2 Video" or "03 Image 2 Video" groups.

hdeanAug 19, 2025· 3 reactions

CivitAI

Probably I am just stupid, but I can't figure this thing. THe text to image gives me the prompted image but it's drawn. Can't get an actual video of a real cat. And, for the life of me, I can't get the i2v to work. Just get errors in the Ksampler. I will try to run it again tonight so I can post the error. But, I am lost here.

arkinson

Author

Aug 19, 2025

Just to rule out the possibility of incorrect models being loaded:

1. Please check twice that you downloaded the exact same models you see in the loader nodes.

2. Go to each loader node and select the right model from your local path.

Rafaeln7435Aug 20, 2025· 3 reactions

CivitAI

mister arkinson, you are god,

arkinson

Author

Aug 20, 2025· 1 reaction

really? 🙄 😇 🙂

Rafaeln7435Aug 20, 2025· 1 reaction

@arkinson before my comfyu was 1 hour per 5 sec gif , now is 15 min with my 3060 12vram . Dammit. Xd

arkinson

Author

Aug 21, 2025· 1 reaction

@Rafaeln7435 Hi - thank you so much for your feedback 👍 I`m really glad this workflow is usable for you 🙂

Workflows

Wan Video 2.2 T2V-A14B

by arkinson

Download (Beta) View on CivitAI

Details

Downloads

3,160

Platform

CivitAI

Platform Status

Available

Created

8/16/2025

Updated

6/30/2026

Deleted

Files

wan22VideoSoundWorkflow_v11NoSageattention.zip

Size:

11.90 KB

SHA256:

aa5cd7ca6b13669d50158a6fab8973bc4f3c70d65ba4236537e6163c353b1884

Mirrors

HuggingFace (1 mirrors)

wan22VideoSoundWorkflow_v11NoSageattention.zip

CivitAI (1 mirrors)

wan22VideoSoundWorkflow_v11NoSageattention.zip

[Edit:

If you have any problems, please refer to the FAQ at the bottom of the page or have a look in the comments.

Features:

Tested generation times:

Comfyui-Easy-Install with Triton + SageAttention:

For testing/understanding/experimenting/changing the workflow:

Short Conclusion:

Frequently Asked Questions (FAQ):

Comfyui Nodes 2.0:

Comfyui crashes after generation while vae decode, upscaling or frame rate multiplying (Rife VFI) without any error report:

JW Nodes (JWFloatToInteger, JWIntergerDiv, JWImageResizeByLongerSide), soundfile missing:

Fresh Comfyui-Easy_Install Installation (missing soundfile and Pytorch v2.9.0 issue with SageAttention on Windows:

Slider Nodes - how can I modify the "default" values:

Description

FAQ

What is Wan 2.2 Video + Sound workflow optimized for RTX 3060 12 GB VRAM GPU?

What files are available and where can I download them?

Comments (20)

Details

Files

wan22VideoSoundWorkflow_v11NoSageattention.zip

Mirrors