This workflow is designed for ComfyUI and leverages the Wan 2.2 Enhanced NSFW I2V model (in GGUF and Safetensors formats) to generate high-quality, dynamic image-to-video (I2V) animations, with a strong focus on NSFW content. It supports advanced features like model switching (high/low quality), audio generation via MMAudioSampler, video upscaling, color matching, and final video compilation at up to 50 FPS. The workflow includes built-in LoRA triggers for specific NSFW scenarios (e.g., cowgirl, deepthroat, cunnilingus, full nelson), making it ideal for creating sensual, explicit animations with realistic motion, lighting, and details.
Key Features:
Image-to-Video Generation: Converts a single input image into a video sequence using the WanImageToVideo node. Supports frame lengths up to 81, batch sizes, and resolutions like 480x720 (configurable via nodes like WIDTH, HEIGHT, LENGTH).
Model Variants: Switch between high-fidelity (Q8H/FP8H) and lightweight (Q8L/FP8L) versions of the Wan 2.2 model for performance optimization. Includes SD3 sampling shifts for better motion coherence.
Prompting System: Dual CLIP text encoders for positive/negative prompts. Built-in notes provide example triggers and prompts for NSFW acts (see original for examples).
Audio Integration: Generates ambient audio (e.g., moans, music) using MMAudioSampler with customizable duration, steps, CFG, and prompts. Negative audio prompts avoid low-quality noise or speech.
Post-Processing: VAE decoding for clean frames; image resizing and upscaling; color matching and restoration; video combining with VHS_VideoCombine (supports H264/H265 MP4, ping-pong looping, CRF quality control, and metadata saving). Preview options: Animation preview at 16 FPS and audio playback.
Optimization: VRAM cleanup nodes, CPU/GPU device switching, and batch processing for efficiency. Supports random seeds for variation.
Output: Saves videos/images in folders like "LongVid/%date:yyyy-MM-dd%/%date:hhmmss%" with prefixes (e.g., V for video, I for image, A for audio). Final videos can be upscaled to 50 FPS.
Requirements:
ComfyUI Version: Latest stable (tested on 2024–2026 builds).
Models (place in the appropriate ComfyUI folders: models/unet, models/vae, models/clip_vision, models/text_encoders, etc.):
Main Diffusion Models (Wan 2.2 Enhanced NSFW SVI Camera variants) — from nolightning's Lightning Edition pack:
wan22EnhancedNSFWSVICamera_nsfwFASTMOVEV2Q8H.gguf →
https://civarchive.com/api/download/models/2540892?type=Model&format=GGUF&size=full&fp=fp8
wan22EnhancedNSFWSVICamera_nsfwFASTMOVEV2Q8L.gguf →
https://civarchive.com/api/download/models/2540896?type=Model&format=GGUF&size=full&fp=fp8
wan22EnhancedNSFWSVICamera_nsfwFASTMOVEV2FP8H.safetensors → https://civarchive.com/api/download/models/2477539?type=Model&format=SafeTensor&size=full&fp=fp8
wan22EnhancedNSFWSVICamera_nsfwFASTMOVEV2FP8L.safetensors → https://civarchive.com/api/download/models/2477548?type=Model&format=SafeTensor&size=full&fp=fp8
VAE: Wan2.1_VAE.pth →
CLIP Vision: clip_vision_h.safetensors →
CLIP Text Encoder: umt5_xxl_fp8_e4m3fn_scaled.safetensors →
Audio: MMAudio model (via comfyui-mmaudio extension) — install the extension; models are usually auto-downloaded or available in the repo.
Upscale model: 4x_NMKD-Siax_200k →
https://civarchive.com/api/download/models/2052724?type=Model&format=PickleTensor
Custom Nodes/Extensions (install via ComfyUI Manager):
comfyui-gguf (for GGUF model loading).
ComfyUI_Comfyroll_CustomNodes (math/utils).
comfyui-easy-use (cleanGpuUsed, mathFloat).
comfyui-kjnodes (INTConstant, ImageResizeKJv2, LoadVideosFromFolder, PreviewAnimation).
comfyui-videohelpersuite (VHS_VideoCombine).
comfyui-mmaudio (MMAudioSampler, audio preview).
comfyui-image-saver (Sampler/Scheduler selectors).
controlaltai-nodes (TwoWay/ThreeWaySwitch).
ComfyLiterals (Float node).
comfyui_memory_cleanup (VRAMCleanup).
Hardware: GPU with at least 12GB VRAM recommended for high-quality runs (e.g., 81-frame videos). CPU fallback available for some nodes.
How to Use:
Load the Workflow: Import the JSON into ComfyUI.
Input Image: Connect an image to the "IMAGE" node (e.g., via Load Image). Resize settings are in the "LOAD IMAGE & RESIZE" group.
Prompts: Edit the POSITIVE/NEGATIVE nodes with your description. Use the built-in trigger words for best NSFW results.
Settings: Adjust in "VIDEO SETTINGS" group:
Resolution: WIDTH/HEIGHT (default 480x720).
Frames: LENGTH (default 81), STEPS (default 8), CFG (default 1).
Seed: Randomize for variations.
Sampler/Scheduler: Euler Ancestral + Simple (defaults).
Batch Size: 1 (increase for multiples).
Run: Queue the prompt. Monitor VRAM with cleanup nodes.
Outputs: Videos save to ComfyUI/output/LongVid (customizable). Preview animation and audio in the workflow.
Advanced: Toggle high/low model switches for quality vs. speed. Add audio prompts in MMAudioSampler. Upscale in the "UPSCALE" group for smoother 50 FPS output.
Tips for Best Results:
NSFW Focus: Start with the example prompts in the notes for fluid motion (e.g., thrusting, jiggling). Avoid overlong prompts to prevent artifacts.
Audio Sync: Match audio duration to video length (default 10s). Use positive prompts like "moans, sensual sounds" and negatives to avoid distortion.
Performance: For low VRAM, use GGUF low models and disable audio/upscaling. Force offload in MMAudioSampler if needed.
Customization: Experiment with LoRAs (loaded in "LOAD LORA'S" group) for specific styles. Negative prompts handle artifacts like blur, distortion, or bad anatomy.
This workflow is optimized for explicit, high-detail NSFW I2V—perfect for creators exploring sensual animations.
Disclaimer:
This workflow is provided for entertainment, artistic, and creative purposes only.
It may not be used for any illegal, harmful, non-consensual, or malicious activities.
Please use it responsibly and respect all applicable laws and ethical guidelines.
Description
FAQ
Comments (19)
I will be testing your project soon, but I would like to ask you a few questions. I would greatly appreciate it if you could answer them.
1. How long does it take you to generate the video?
2. What is your graphics card?
3. Can you tell me what really influences the rendering time? Is it the graphics card?
a raw video of 10s at 480*720 resolution takes about 5 min to make, then for the audio, color match, upscaling and frame interpolation another 5 min.
also my comfyui uses sageattention 2 witch also speeds up the generation.
My setup is:
RTX 5080
32GB RAM at 5600 speed
Intel i7-14700k
@MrXin Sir, I'm having a problem. I'm using the indicated settings:
GGUF, Diffusion Model, VAE, Clip, Clip Vision, LORA, and Prompts.
However, my character doesn't take off her clothes, doesn't perform the action, she just keeps swaying. I'm using it exactly as the workflow comes with the configuration. What do you think it could be?
@Valorizando@Valorizando can you give me the prompt that you use? and what model have you downloaded? i use this GGUF model https://civitai.com/api/download/models/2540892?type=Model&format=GGUF&size=full&fp=fp8
@MrXin The mistake was mine — I downloaded the wrong one. I admit it’s easy to get confused without the direct link. It might even be a good suggestion, if you think it’s valid, to add the direct model links in your workflow description on Civitai to avoid this kind of mistake. But thank you very much — I’m going to test it now.
@Valorizando thx for the advice, i wil put the link in the description :)
Nice work. How I can slow down video speed? It is now really fast? Length is in seconds not in frames so could it be the problem?
Lower the FPS then interpolate?
Did you change any settings in the workflow?
Normaly the raw video is made at 16 fps and when you upload the video to the video editor its still 16 fps. Then it gets interpolated to 25 fps for mma audio and after the audio and upscaler it gets from 25 fps to 50 fps.
how come we need both the .gguf and the diffusion models?
You don’t need to use both; you can remove or disable the one you don’t need. The switch can handle it if one of the two is disabled. I like to experiment with them to compare the results using the same prompts but different diffusion models.
gotcha, thanks!
INCREDIBLE WORKFLOW! THANK YOU!!!
Hey, good work here.
I was just wondering, why does the model repeats the motion twice during the whole 10 secs?
let's say the model is removing the top, she does this and she wears it back and then removes it again, why?
Hey, Wan 2.2 can only generate 5-second videos without repetitions. I use the 10-second setting for videos where some repetition might occur. I'm currently working on a workflow for generating longer videos, but it's still in the testing phase.
@MrXin oh cool, doesn't longer videos mean more repetitions then?
@v1k3d1194559 no, because you work with the last frame from the previous video. I just uploaded the new workflow for longer videos.
@MrXin Thanks, I'll check that. I also have a doubt, does sageattention work for 50 series nvidia cards? for me even after installing sageattention, I keep getting python.h error. Claude said there are no prebuilt whl of sageattention for blackwell cards yet, is this true?
@v1k3d1194559 Hey, SageAttention 2 also works with the RTX 5080. SageAttention 3 on Windows can be a bit tricky to set up.
I use ComfyUI-Easy-Install from here:
https://github.com/Tavris1/ComfyUI-Easy-Install
After installing ComfyUI, there’s another folder inside the ComfyUI directory called addons. From there, you can install SageAttention. It’s really easy—just follow the instructions in the README file, and it works great.
There’s also a YouTube video that explains the whole process:
https://www.youtube.com/watch?v=CgLL5aoEX-s&t=753s