π¬ Professional Video-to-Video Transformation with WAN VACE
Transform your videos with professional quality using this comprehensive ComfyUI workflow for WAN VACE. This complete pipeline enables seamless video-to-video transformation of long-form videos with advanced features including seamless joining, upscaling, and frame interpolation. Break down lengthy videos into manageable segments, process them individually, and seamlessly combine them back into cohesive, high-quality output.
β¨ Key Features
Long Video Processing: Handle extended video content by breaking into segments and seamlessly rejoining
Complete V2V Pipeline: Full video-to-video transformation workflow
Seamless Video Joining: Custom nodes for professional video concatenation without visible transitions
Multi-Step Process: Generate β Join β Combine β Upscale β Interpolate
Professional Quality: High-quality output with customizable settings
Memory Optimization: Low VRAM options for various GPU configurations
Batch Processing: Process multiple video segments efficiently
Scalable Architecture: Handle videos of any length through intelligent segmentation
π Requirements
Essential Model Files
π΄ WAN GGUF Models
Download from: QuantStack/Wan2.1_T2V_14B_FusionX_VACE-GGUF
Choose your preferred quantization (Q3_K_S, Q8_0, etc.)
Place in: ComfyUI/models/unet
π£ WAN VAE
Download wan_2.1_vae.safetensors from: Comfy-Org/Wan_2.1_ComfyUI_repackaged
Place in: ComfyUI/models/vae
π£ WAN Text Encoder
Download GGUF text encoders from: city96/umt5-xxl-encoder-gguf
Place in: ComfyUI/models/text_encoders
Required Custom Nodes
β οΈ Important: Download these custom nodes from this page (not available in ComfyUI-Manager):
ComfyUI Extensions
βοΈInstall these custom notes using the ComfyUI-Manager.
ComfyUI-GGUF
ComfyUI-VideoHelperSuite
ComfyUI-KJNodes
ComfyUI-ControlNet-Aux
ComfyUI-Frame-Interpolation
ComfyUI-Easy-Use
π Step-by-Step Guide
Initial Setup
Configure Constants:
Width/Height: 576x1024 (9:16 aspect ratio) or match your source video
Length: 81 frames per segment
Skip Frames: Start with 0
Filename Prefix: Set your output folder and prefix
Load Source Materials:
Load your source video for restyling
Load reference image (ensure similar pose to first video frame)
Use SDXL/FLUX with LoRA and ControlNet for best pose matching
Step 1: Generate WAN Videos
Write Prompts:
Describe subject, outfit, and background
Include action phrases for dynamic results
Generate Video Segments:
Click run to generate first 81-frame video segment
Increase skip frames by 81 to process next segment
Repeat for the entire length of your source video
Final segment can be shorter but may have lower quality
For long videos: Continue this process until you've covered the full duration
Step 2: Join Videos Seamlessly
Configure Joining:
Set folder path to your generated videos
Set filename prefix matching your generated files
Start with filename suffix = 1
Use same prompt from Step 2
Join Process:
Run to join first and second videos
Increase filename suffix by 1
Run to join second and third videos
Repeat until all segments are joined
Step 3: Combine, Upscale, and Interpolate
Final Processing Setup:
Set folder path to joined videos
Keep filename suffix = 1 (constant)
Set combine filename for final output
Set upscale filename for enhanced version
Execute Final Pipeline:
Combine all joined videos
Upscale using RealESRGAN (2x scale)
Interpolate frames using FILM VFI (2x frame rate)
βοΈ Advanced Settings
Low VRAM Configuration
Use the UnetLoaderGGUFDisTorchMultiGPU node for memory optimization
Set virtual_vram_gb to 2.0-4.0 for 12GB and lower GPUs
Enable use_other_vram for additional memory fallback
Performance Optimization
Bypass PathchSageAttentionKJ and ModelPatchTorchSettings if you don't have Triton
Adjust batch sizes based on your GPU memory
Use appropriate quantization levels for your hardware
π― Tips for Best Results
Long Video Strategy: Plan your segmentation approach - 81 frames per segment ensures smooth transitions while maintaining manageable processing chunks
Reference Image Quality: Use high-quality reference images with poses similar to your source video's first frame
Prompt Engineering: Be specific about subject details, clothing, and background elements
Segment Planning: Plan your video segments to maintain narrative continuity across the entire video length
Hardware Considerations: Adjust settings based on your GPU capabilities - longer videos benefit from optimized VRAM settings
Consistency Maintenance: Keep prompts consistent across all segments to ensure visual coherence in the final long video
π©Ί Troubleshooting
OOM Errors: Increase virtual_vram_gb or reduce batch sizes
Missing Nodes: Ensure all custom nodes are properly installed
Quality Issues: Check reference image alignment and prompt specificity
Processing Slow: Consider using lower quantization models for faster generation
π§ Custom Nodes Parameter Guide
WanVideoVaceSeamlessJoin Node
This custom node seamlessly joins two video clips with intelligent masking for smooth transitions.
Parameters:
mask_last_frames(INT): Number of frames to mask at the end of the first videoDefault: 0
Range: 0-20
Use 0 for no masking, 5-10 for subtle blending
mask_first_frames(INT): Number of frames to mask at the beginning of the second videoDefault: 10
Range: 0-20
Recommended: 10 frames for smooth transitions
frame_load_cap(INT): Maximum number of frames to load from each videoDefault: 81
Range: 1-1000
Should match your segment length (typically 81)
first_video_path(STRING): Full path to the first video fileFormat:
"C:\path\to\video1.mp4"Use absolute paths for reliability
second_video_path(STRING): Full path to the second video fileFormat:
"C:\path\to\video2.mp4"Ensure file exists and is accessible
Outputs:
image: Combined video frames as image sequencemask: Generated mask for the transition area
CombineVideoClips Node
This node combines multiple video clips into a single continuous sequence with advanced masking options.
Parameters:
frame_load_cap(INT): Maximum frames to load per videoDefault: 81
Range: 1-1000
Should match your segment frame count
mask_last_frames(INT): Frames to mask at the end of each video (except last)Default: 0
Range: 0-20
Use 0 for clean cuts, 5-10 for fade effects
mask_first_frames(INT): Frames to mask at the beginning of each video (except first)Default: 10
Range: 0-20
Recommended: 10 for smooth transitions
first_video_path(STRING): Path to the first video in sequenceBase video - typically your original generated video
first_joined_video_path(STRING): Path to first seamlessly joined videoResult from first WanVideoVaceSeamlessJoin operation
second_joined_video_path(STRING): Path to second seamlessly joined videoResult from second WanVideoVaceSeamlessJoin operation
third_joined_video_path(STRING): Path to third seamlessly joined videoContinue pattern for additional segments
fourth_joined_video_path(STRING): Path to fourth seamlessly joined videoOptional - use if you have this many segments
fifth_joined_video_path(STRING): Path to fifth seamlessly joined videoOptional - maximum supported segments
last_video_path(STRING): Path to the final video in sequenceThe last generated video segment
Output:
image: Combined video sequence as image frames ready for final processing
Parameter Optimization Tips:
For Seamless Joining:
Short transitions:
mask_first_frames = 5,mask_last_frames = 0Smooth blending:
mask_first_frames = 10,mask_last_frames = 5Long crossfades:
mask_first_frames = 15,mask_last_frames = 10
For File Paths:
Ensure all video files exist before running
Use consistent naming conventions for easier batch processing
Frame Count Considerations:
Set
frame_load_capto match your segment length (usually 81)Smaller values may truncate longer segments
This workflow provides professional-grade video transformation capabilities with comprehensive control over the entire pipeline from generation to final output.
Description
v1.0 - Initial Release
Complete WAN VACE V2V workflow
Seamless video joining capabilities
Integrated upscaling and interpolation
Memory optimization options
Comprehensive documentation and setup guide
Support for custom aspect ratios and resolutions
FAQ
Comments (6)
First great work, amazing workflow !!!!
For those who don't know where to download and how to install PY file.
π Step 1: Download the Scripts
Download the files seamless_join_video_clips.py and combine_video_clips.py from here, or the source (e.g., GitHub, shared repo, etc.).
Move or copy both .py files into your ComfyUI custom nodes folder, usually located at:
ComfyUI/custom_nodes/
-
π Step 2: Open CMD in the Script Directory
Open the folder where you placed the .py files.
In the Windows File Explorer, click the address bar at the top of the window.
Replace the path with cmd and press Enter.
This will open a Command Prompt (CMD) window directly in that folder.
-
π§ͺ Step 3: Install the Scripts
In the CMD window, run the following commands one at a time:
python seamless_join_video_clips.py install
python combine_video_clips.py install
This step ensures that any required dependencies or setup operations are performed.
-
π Step 4: Restart ComfyUI
After installation, you must restart ComfyUI (or your Python environment) for the new nodes to appear in the interface.
If you're running ComfyUI via a Python script (python main.py), just stop it and run it again.
If you use a ComfyUI launcher, close and relaunch it.
-
π You're Done!
-
You should now see new nodes or functions related to:
Seamless Join Video Clips
Combine Video Clips
These tools are ideal for merging multiple video outputs into one continuous sequence with better transitions and handling.
-
π Notes
If you get a "Python not recognized" error, ensure Python is installed and added to your system environment variables.
If using a virtual environment for ComfyUI, make sure CMD is run inside that environment.
This is NOT pro-level VACE use but the most elementary workflow possible. VACE is capable of so much more. To become expert, one must begin to control every frame within the VACE reference video input. For instance one can place keyframes anywhere within this stream, not just the first frame (or even first and last). pose frames can be mixed with key frames. In other words, one wants a workflow that shows every frame of the reference video, and allows multiple configurable inputs to this frame array with necessary mask data.
Now the very simple use of a motion control video (which you can capture from the camera of your phone) to puppeteer the reference image is a very cool starting point, but I would suggest a much simpler workflow to do this, so the user has a better chance to understand what is happening, and thus is more likely to want to start experimenting in more sophisticated directions.
The custom nodes don't work anymore ?
I mean i can't access them
If there are longer videos, change how to merge them
First of all, thank you for sharing this workflow!π
I spent almost 3 full days researching best ways to create long videos, did experiments with Hunyuan framepack (really did not like it), then Wan, and then I found your system with Wan Vace.
The documentation is nice explained.
I got a little problem at the end of your workflow, I'm trying to fix. Hope you can help.
I'm creating a video of 10 seconds, so the final result should be x2 clip (81 frames each) combined. I generated the 2 videos of the step 1, and the video in the middle of step 2. But on step 3, I get a comfy error, it says "can't find file wan_fx_00005.mp4". Of course there is no 00005 file (because there are 2 videos 00001 and 00002) but for some reason the workflows is trying to use 00005. Did i miss something?
Looks like we don't have an active mirror for this file right now.
CivArchive is a community-maintained index β we catalog mirrors that volunteers upload to HuggingFace, torrents, and other public hosts. Looks like no one has uploaded a copy of this file yet.
Some files do get recovered over time through contributions. If you're looking for this one, feel free to ask in Discord, or help preserve it if you have a copy.