Turn images into talking, moving characters with pose and audio control.
Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning — you still choose inputs, prompts, and settings.
Open preloaded workflow on RunComfy
Open preloaded workflow on RunComfy (browser)
Why RunComfy first
- Fewer missing-node surprises — run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout — useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON — the zip follows the same runnable workflow you can open on RunComfy.
When downloading for local ComfyUI makes sense — you want full control over models on disk, batch scripting, or offline runs.
How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.
Expectations — First run may pull large weights; cloud runs may require a free RunComfy account.
Overview
This workflow lets you create expressive and controllable character animations with audio-driven lip synchronization and pose-based motion. Starting with a static image, a pose reference, and an audio clip, it generates seamless videos where characters move naturally and speak in sync. You can tailor movement and expressions with high precision, making it great for avatars, digital storytelling, or music videos. Backed by the Wan 2.2 model, it delivers natural body tracking and smooth speech alignment. It is designed for creators who need efficient, high-quality video generation with creative control.
Important nodes:
Key nodes in Comfyui Pose Control LipSync with Wan2.2 S2V workflow
WanSoundImageToVideo (#55)
The heart of the workflow that conditions Wan2.2‑S2V with your prompt, vocals, subject image, and pose control video. Adjust only what matters: set width, height, and length to match your subject image and audio length, and plug a preprocessed pose video for motion control. Leave ref_motion empty unless you plan to inject a separate camera track. The model’s speech‑to‑video behavior is described in Wan‑AI/Wan2.2‑S2V‑14B and Wan‑Video/Wan2.2.
DWPreprocessor (#78)
Generates pose maps using YOLOX for detection and DWPose for whole‑body keypoints. Strong pose cues help Wan follow limbs and torso while audio controls lips and expressions. If your reference has heavy camera motion, use a pose video that aligns viewpoint and timing with the intended performance. DWPose and its variants are documented in IDEA‑Research/DWPose.
KSamplerAdvanced (#64)
Executes denoising for the latent sequence. With a LightX2V LoRA loaded, you can keep steps low for fast previews while retaining motion coherence; increase steps when pushing for maximum detail. Scheduler choices affect motion smoothness versus crispness, and should be tuned together with LoRA usage as outlined for Wan in the Diffusers documentation.
VHS_LoadVideo (#80)
Imports and scrubs your pose reference. Use its in‑node frame selection tools to pick the exact segment that matches your audio segment. Keeping framing and subject size consistent with the reference image will stabilize motion transfer. The node is part of VideoHelperSuite: ComfyUI‑VideoHelperSuite.
…
Notes
Pose Control LipSync with Wan2.2 S2V in ComfyUI | Audio2Video — see RunComfy page for the latest node requirements.
Description
Initial release — Pose-Control-LipSync-Wan2.2-S2V.
