MoCha | Wan2_1_mocha-14B-preview_fp8_e4m3fn_scaled_KJ

Transfer from huggingface.

https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/tree/main/MoCha

sample of workflow: https://www.runninghub.ai/post/1980967242705854466

Getting Started with MoCha

To start your own character replacement with MoCha, the following three inputs are required:

Source Video: The original video with the character to be replaced.
Designation Mask for the First Frame: A mask marking the source character to be replaced in the first frame of Source Video.
Reference Images: Reference Images of the new character for replacement with clean background. We recommend uploading at least one high-quality, front-facing facial close-up.

Abstract

End-to-End Video Character Replacement without Structural Guidance.

Controllable video character replacement with a user-provided one remains a challenging problem due to the lack of qualified paired-video data. Prior works have predominantly adopted a reconstruction-based paradigm reliant on per-frame masks and explicit structural guidance (e.g., pose, depth). This reliance, however, renders them fragile in complex scenarios involving occlusions, rare poses, character-object interactions, or complex illumination, often resulting in visual artifacts and temporal discontinuities. In this paper, we propose MoCha, a novel framework that bypasses these limitations, which requires only a single first-frame mask and re-renders the character by unifying different conditions into a single token stream. Further, MoCha adopts a condition-aware RoPE to support multi-reference images and variable-length video generation. To overcome the data bottleneck, we construct a comprehensive data synthesis pipeline to collect qualified paired-training videos. Extensive experiments show that our method substantially outperforms existing state-of-the-art approaches.

see details about the model: https://orange-3dv-team.github.io/MoCha/

Qualitative Performance

Cartoon Character Replacement

MoCha generates high-fidelity videos when conditioned on cartoon character reference images.

Real-Person Character Replacement

MoCha also performs well in replacing real-person characters in source videos.

Scene Illumination Consistency

Compared with existing works, MoCha can better preserve the lighting and color tone of the original video, making the character more naturally integrated into the new environment. Furthermore, MoCha can handle complex lighting conditions, such as shaking lights and strong backlighting.

Precise Action Preservation

MoCha can accurately replicate the actions and expressions of the original video, even in complex scenarios involving fast movements and object interactions. This ensures that the generated character video maintains high fidelity to the source performance.

Transfer from huggingface.

Getting Started with MoCha

Abstract

Qualitative Performance

Cartoon Character Replacement

Real-Person Character Replacement

Scene Illumination Consistency

Precise Action Preservation

Description

FAQ

Details

Files

Available On (1 platform)

Transfer from huggingface.

Getting Started with MoCha

Abstract

Qualitative Performance

Cartoon Character Replacement

Real-Person Character Replacement

Scene Illumination Consistency

Precise Action Preservation

Description

FAQ

What is MoCha | Wan2_1_mocha-14B-preview_fp8_e4m3fn_scaled_KJ?

How do I use MoCha | Wan2_1_mocha-14B-preview_fp8_e4m3fn_scaled_KJ?

What should I watch out for with Wan Video models?

What other Wan Video-based models are worth knowing?

Can I use this model commercially?

Details

Files

Available On (1 platform)