CivArchive
    Wan2.1 InfiniteTalk V2V Lipsync (GGUF) workflows - v1.1.3

    Important Notice

    Optimized to work with the latest version of ComfyUI (v0.18.1 + Frontend 1.42.8).

    Overview

    This workflow uses Wan2.1 InfiniteTalk to perform native V2V lip sync.
    Even if the input video is long, the workflow will automatically repeat the extension process as needed.

    What This Workflow Does

    Using the automatic segmentation feature of Florence2Run + SAM2, a face mask is generated and then re-rendered with InfiniteTalk.
    This keeps motion outside the face faithful to the original video, while maintaining facial consistency and applying accurate lip sync.

    Notes

    If you encounter the following error:

    RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

    please change the audio_encoder from:

    wav2vec2-chinese-base_fp16.safetensors
    to
    wav2vec2-chinese-base_fp32.safetensors

    You can download it from the same location as the fp16 version.

    Depending on the original video's frame count, the output may be rounded down, resulting in the video being 1–3 frames shorter.

    The length is calculated from the latent frame count n using the formula:

    (n - 1) * 4 + 1

    Because of this rule, it is not possible to generate more frames than exist in the source video.

    For example, if the final chunk has 14 frames remaining, the selectable lengths would be 13 or 17.
    However, since frames 15–17 do not exist in the source video, they cannot be generated.
    As a result, the length is rounded down.

    If anyone has a good idea to improve this limitation, suggestions are welcome.

    Description

    Optimized to work with the latest version of ComfyUI (v0.18.1 + Frontend 1.42.8).

    v1.1.3 : Minor fix

    FAQ

    Comments (16)

    404error404errorMar 3, 2026
    CivitAI

    For an 81-frame video, you recommend setting the chunk length to 77 frames.
    If the video has 126 frames, should the chunk length be set to 122 frames as well?

    javawock7618
    Author
    Mar 4, 2026· 1 reaction

    For 126F videos, please still set the chunk length to 77F.

    WanInfiniteTalk is based on 25 FPS, so the input will be automatically forced from 126F to 197F.
    The final output will then be processed as 77 + 77 + 41, resulting in a 195F video.

    In my workflow, processing always runs at least twice using First + Extend.
    Since WanInfiniteTalk throws an error when a single chunk exceeds 81F, it is internally processed as 77 + 4, resulting in a 78F video. Please note that the reason why the final output video may be shorter is explained separately.

    Ideally, for an 81F video, the chunk length should also be set to 81 and completed in a single pass, but this is currently not supported.

    404error404errorMar 4, 2026

    @javawock7618 I completed the process, but only the first 0.5 seconds have visible content. The remaining 4+ seconds are completely black frames.

    Do you have any suggestions on how to resolve this issue or adjust the settings to prevent the black output?

    javawock7618
    Author
    Mar 4, 2026· 1 reaction

    @404error404error Unfortunately, I have not encountered this issue myself. However, based on similar reports, possible causes may include:

    Attention-related issues

    OOM (insufficient VRAM)

    It may or may not resolve the problem, but you can try:

    Disabling Sage-Attention (inside Model Loader SubGraph → Patch Sage Attention KJ)

    Reducing the video resolution and length (e.g., 480×360, 49F)

    I’m sorry, but I’m not the developer of the node, so I don’t have deeper technical insight.

    There is a post on Reddit reporting that disabling Sage-Attention improved the issue. You may find it helpful for reference:
    https://www.reddit.com/r/StableDiffusion/comments/1n3v2bk/infinitetalk_black_screen_issue_workflow_details/

    404error404errorMar 4, 2026

    @javawock7618 Thank you very much for the information. However, I don’t seem to see these nodes available to toggle in this workflow. How should I proceed?

    javawock7618
    Author
    Mar 4, 2026

    @404error404error To disable Sage-Attention, follow these steps:

    Open the Model Loader (SubGraph) by clicking the top-right corner of the node.

    Expand Patch Sage Attention KJ by clicking the left side of the node.

    Set auto to disable.

    Alternatively, you can disable Sage-Attention at the time you launch ComfyUI.

    404error404errorMar 4, 2026

    @javawock7618 Thank you very much for the information. It was very helpful after I tried it. However, the generation time has unfortunately increased significantly.

    javawock7618
    Author
    Mar 5, 2026

    @404error404error Okay. I’ve released a version that allows overriding the attention backend in the GGUF Model Loader, so please give it a try.

    Switching to xFormers may provide a slight speed improvement, although it may also have no noticeable effect.
    Set Model Loader → attention_override from none to xformers.

    404error404errorMar 5, 2026

    @javawock7618 Thank you very much for your help. I will try this new version.

    404error404errorMar 25, 2026
    CivitAI

    Please update the workflow.
    The current workflow appears to conflict with the latest version of ComfyUI.

    Possible issues include:
    Rebuild ComfyUI 0.17x/0.18.x compatibility

    Video combine widgets are not displayed in the FrontUI (bug).

    Personally, I have encountered broken connections within the graph.
    I have to reconnect them manually every time.

    javawock7618
    Author
    Mar 31, 2026· 1 reaction

    I updated the workflow so that it works with the latest ComfyUI (v0.18.1 + Frontend 1.42.8).

    404error404errorApr 4, 2026
    CivitAI

    If the video is longer than 24 seconds, do I need to change any settings?

    javawock7618
    Author
    Apr 4, 2026· 1 reaction

    No, it will automatically loop until it processes the full length of the video, so there is no need to change the settings. (However, please make sure to match the duration in Trim Audio Duration.)

    404error404errorApr 4, 2026

    @javawock7618 Thank you very much for the information.
    However, I’m not sure why, but even though I provided a 24-second video, it still only outputs 6 seconds.

    javawock7618
    Author
    Apr 7, 2026· 1 reaction

    @404error404error After investigating, it appears that the preview video within the Extend group does not load the processed video, causing it to stop updating after the first loop iteration. The workflow itself operates correctly, and the final output video is normal once processing completes. However, I have updated it to a version where this troublesome bug has been resolved.

    404error404errorApr 7, 2026

    @javawock7618 Thank you very much for the information. However, my final output video is indeed only 6 seconds long, while the original video is 24 seconds.

    Workflows
    Wan Video 14B i2v 480p

    Details

    Downloads
    354
    Platform
    CivitAI
    Platform Status
    Available
    Created
    2/15/2026
    Updated
    5/15/2026
    Deleted
    -

    Files

    wan21InfinitetalkV2V_v111.zip

    Mirrors

    HuggingFace (1 mirrors)

    wan21InfinitetalkV2V_v11.zip

    Mirrors

    HuggingFace (1 mirrors)
    CivitAI (1 mirrors)

    wan21InfinitetalkV2V_v113.zip

    Mirrors

    HuggingFace (1 mirrors)