(NSFW) Dead-Simple MMAudio + RIFE Interpolation Setup for WAN 2.2 I2V 14B

(NSFW) Dead-Simple MMAudio + RIFE Interpolation Setup for WAN 2.2 I2V 14B - v1.0.2

NSFW

Changelog

Version 1.0.3: Connected both steps so no more re-uploading is required. Just upload your video in Step 1 and hit Run.

Version 1.0.2: Changed VHS nodes to VHS ffmpeg nodes to avoid color drift (thank you LastAssignment). Also changed FPS flow from 24 to 25 to more closely align to MMAudio specs.

Version 1.0.1: RIFE Group output was set to 8fps by accident. Changed it to 24fps

Version 1.0: Initial release

A TRIBUTE TO GOONERS EVERYWHERE

Your WAN 2.2 video is great. It looks awesome. But where's the sound? We moved from images to videos, and WAN 2.2 is incredible for video. The missing piece...AUDIO!

This is my first article ever, so I'm sorry if I made any mistakes. Please leave a comment if I've made an error or if you need any help. For your reference, I'm running:

ComfyUI 0.3.68
Torch 2.9
CUDA 13
Python 3.13.9
Sage Attention 2.2
NVIDIA 5070 Ti (16gb vram)

And here are the custom nodes (3 in total):

ComfyUI-VideoHelperSuite 1.7.7 (https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite)
ComfyUI-MMAudio Nightly (https://github.com/kijai/ComfyUI-MMAudio)
- I recommend manually git cloning this node pack into your /ComfyUI/models/custom_nodes folder and then installing the requirements.txt file using your embedded python. I'm on portable Comfy, so the command would look something like this:
  - "C:\ComfyUI\python_embeded\python.exe" -m pip install -r "C:\ComfyUI\ComfyUI\custom_nodes\ComfyUI-MMAudio\requirements.txt"

ComfyUI-VFI Unknown (https://github.com/GACLove/ComfyUI-VFI)
- I think there's a more popular RIFE custom node that a lot of other people use, but Icouldn't figure out how to get fractional multiples for interpolation (16 -> 25fps is a ~1.5x interpolation), but this node allows it.

Onto the workflow...

------------------------------------

This workflow handles two jobs:

Fix WAN 2.2’s native 16fps output by interpolating it to 25fps with RIFE.
Generate synced audio with MMAudio using the final 25fps video.

The setup is plug-and-play. Drop in your WAN video → interpolate → feed it into MMAudio → get synced output. The included notes explain the reasoning for FPS, step settings, and seed behavior.

What this workflow covers:

RIFE interpolation from 16 → 25 fps.
MMAudio sampler
1. Upon some further testing, 50-100 steps works well. The node runs pretty fast in general, and it's also worthwhile toying with CFG (4.5 - 8). 100 steps and CFG 8 works well for high-quality output and better prompt adherence.
Automatic audio + video combine at 25fps.
Optional re-interpolation afterward if you want 30fps+ output.
1. You can plug your finished 25fps video into the 'Step 1: Rife Interpolation' group and just change the 'source_fps' to 25 and the 'target_fps' to 30.

Required MMAudio files

Download all of these into:

ComfyUI/models/mmaudio

MMAudio NSFW Model (fine-tuned off the base model)

https://huggingface.co/phazei/NSFW_MMaudio/resolve/main/mmaudio_large_44k_nsfw_gold_8.5k_final_fp16.safetensors?download=true

MMAudio VAE (fp16)

https://huggingface.co/Kijai/MMAudio_safetensors/resolve/5984623e6b436818c6ff287ef6eec93e3e05aa3f/mmaudio_vae_44k_fp16.safetensors

MMAudio Synchformer (fp16)

https://huggingface.co/Kijai/MMAudio_safetensors/resolve/main/mmaudio_synchformer_fp16.safetensors

MMAudio CLIP Encoder (fp16)

https://huggingface.co/Kijai/MMAudio_safetensors/resolve/main/apple_DFN5B-CLIP-ViT-H-14-384_fp16.safetensors

Nvidia BigVGAN v2 24KHz 100band 512x

This seems to be required for MMAudio to work. You can manually download all the files, git clone, or use the HuggingFace CLI tool (huggingface-cli repo clone URL). The repo should be placed in the ComfyUI/models/mmaudio folder.

https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_512x

Bonus

Once you've created a good MMAudio track, there are some further steps you can take depending on what you'd like to create.

1. Import your audio/video into some type of software (CapCut/Shotcut) and layer on some music in the background. I've done this with a few of my videos. I added a 'radio' filter to make it seem like the music was kinda tinny and playing in the background.

2. Layer other audio tracks alongside the NSFW audio track. You can see KaptainSisay very elegantly did something like that here (https://civarchive.com/images/110700679)

Description

Changed VHS nodes to VHS ffmpeg nodes to avoid color drift (thank you LastAssignment).
Also changed FPS flow from 24 to 25 to more closely align to MMAudio specs.

FAQ

Comments (154)

texaspartygirlDec 11, 2025

CivitAI

yessss, this is the missing piece! I wonder how I would train this on my own dataset, is video used or just audio? are there tuts for this?

Your WAN 2.2 video is great. It looks awesome. But where's the sound? We moved from images to videos, and WAN 2.2 is incredible for video. The missing piece...AUDIO!

This workflow handles two jobs:

Required MMAudio files

Bonus

Description

FAQ

What is (NSFW) Dead-Simple MMAudio + RIFE Interpolation Setup for WAN 2.2 I2V 14B?

What files are available and where can I download them?

Comments (154)

Details

Files

NSFWDeadSimpleMmaudioRIFE_v102.zip

Mirrors