CivArchive
    MoCha | Wan2_1_mocha-14B-preview_fp8_e4m3fn_scaled_KJ - Preview
    NSFW

    Transfer from huggingface.

    https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/tree/main/MoCha

    sample of workflow: https://www.runninghub.ai/post/1980967242705854466

    Getting Started with MoCha

    To start your own character replacement with MoCha, the following three inputs are required:

    • Source Video: The original video with the character to be replaced.

    • Designation Mask for the First Frame: A mask marking the source character to be replaced in the first frame of Source Video.

    • Reference Images: Reference Images of the new character for replacement with clean background. We recommend uploading at least one high-quality, front-facing facial close-up.

    Abstract

    End-to-End Video Character Replacement without Structural Guidance.

    Controllable video character replacement with a user-provided one remains a challenging problem due to the lack of qualified paired-video data. Prior works have predominantly adopted a reconstruction-based paradigm reliant on per-frame masks and explicit structural guidance (e.g., pose, depth). This reliance, however, renders them fragile in complex scenarios involving occlusions, rare poses, character-object interactions, or complex illumination, often resulting in visual artifacts and temporal discontinuities. In this paper, we propose MoCha, a novel framework that bypasses these limitations, which requires only a single first-frame mask and re-renders the character by unifying different conditions into a single token stream. Further, MoCha adopts a condition-aware RoPE to support multi-reference images and variable-length video generation. To overcome the data bottleneck, we construct a comprehensive data synthesis pipeline to collect qualified paired-training videos. Extensive experiments show that our method substantially outperforms existing state-of-the-art approaches.

    see details about the model: https://orange-3dv-team.github.io/MoCha/

    Qualitative Performance

    Cartoon Character Replacement

    MoCha generates high-fidelity videos when conditioned on cartoon character reference images.

    Real-Person Character Replacement

    MoCha also performs well in replacing real-person characters in source videos.

    Scene Illumination Consistency

    Compared with existing works, MoCha can better preserve the lighting and color tone of the original video, making the character more naturally integrated into the new environment. Furthermore, MoCha can handle complex lighting conditions, such as shaking lights and strong backlighting.

    Precise Action Preservation

    MoCha can accurately replicate the actions and expressions of the original video, even in complex scenarios involving fast movements and object interactions. This ensures that the generated character video maintains high fidelity to the source performance.

    Description

    FAQ

    Checkpoint
    Wan Video 14B t2v

    Details

    Downloads
    146
    Platform
    SeaArt
    Platform Status
    Available
    Created
    10/22/2025
    Updated
    10/22/2025
    Deleted
    -

    Files

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.