CivArchive
    (NSFW) Dead-Simple MMAudio + RIFE Interpolation Setup for WAN 2.2 I2V 14B - v1.0.1
    NSFW

    Changelog

    Version 1.0.3: Connected both steps so no more re-uploading is required. Just upload your video in Step 1 and hit Run.

    Version 1.0.2: Changed VHS nodes to VHS ffmpeg nodes to avoid color drift (thank you LastAssignment). Also changed FPS flow from 24 to 25 to more closely align to MMAudio specs.

    Version 1.0.1: RIFE Group output was set to 8fps by accident. Changed it to 24fps

    Version 1.0: Initial release

    A TRIBUTE TO GOONERS EVERYWHERE

    Your WAN 2.2 video is great. It looks awesome. But where's the sound? We moved from images to videos, and WAN 2.2 is incredible for video. The missing piece...AUDIO!

    This is my first article ever, so I'm sorry if I made any mistakes. Please leave a comment if I've made an error or if you need any help. For your reference, I'm running:

    • ComfyUI 0.3.68

    • Torch 2.9

    • CUDA 13

    • Python 3.13.9

    • Sage Attention 2.2

    • NVIDIA 5070 Ti (16gb vram)

    And here are the custom nodes (3 in total):

    • ComfyUI-VideoHelperSuite 1.7.7 (https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite)

    • ComfyUI-MMAudio Nightly (https://github.com/kijai/ComfyUI-MMAudio)

      • I recommend manually git cloning this node pack into your /ComfyUI/models/custom_nodes folder and then installing the requirements.txt file using your embedded python. I'm on portable Comfy, so the command would look something like this:

        • "C:\ComfyUI\python_embeded\python.exe" -m pip install -r "C:\ComfyUI\ComfyUI\custom_nodes\ComfyUI-MMAudio\requirements.txt"

    • ComfyUI-VFI Unknown (https://github.com/GACLove/ComfyUI-VFI)

      • I think there's a more popular RIFE custom node that a lot of other people use, but Icouldn't figure out how to get fractional multiples for interpolation (16 -> 25fps is a ~1.5x interpolation), but this node allows it.

    Onto the workflow...

    ------------------------------------

    This workflow handles two jobs:

    1. Fix WAN 2.2’s native 16fps output by interpolating it to 25fps with RIFE.

    2. Generate synced audio with MMAudio using the final 25fps video.

    The setup is plug-and-play. Drop in your WAN video → interpolate → feed it into MMAudio → get synced output. The included notes explain the reasoning for FPS, step settings, and seed behavior.

    What this workflow covers:

    1. RIFE interpolation from 16 → 25 fps.

    2. MMAudio sampler

      1. Upon some further testing, 50-100 steps works well. The node runs pretty fast in general, and it's also worthwhile toying with CFG (4.5 - 8). 100 steps and CFG 8 works well for high-quality output and better prompt adherence.

    3. Automatic audio + video combine at 25fps.

    4. Optional re-interpolation afterward if you want 30fps+ output.

      1. You can plug your finished 25fps video into the 'Step 1: Rife Interpolation' group and just change the 'source_fps' to 25 and the 'target_fps' to 30.

    Required MMAudio files

    Download all of these into:

    ComfyUI/models/mmaudio

    MMAudio NSFW Model (fine-tuned off the base model)

    https://huggingface.co/phazei/NSFW_MMaudio/resolve/main/mmaudio_large_44k_nsfw_gold_8.5k_final_fp16.safetensors?download=true

    MMAudio VAE (fp16)

    https://huggingface.co/Kijai/MMAudio_safetensors/resolve/5984623e6b436818c6ff287ef6eec93e3e05aa3f/mmaudio_vae_44k_fp16.safetensors

    MMAudio Synchformer (fp16)

    https://huggingface.co/Kijai/MMAudio_safetensors/resolve/main/mmaudio_synchformer_fp16.safetensors

    MMAudio CLIP Encoder (fp16)

    https://huggingface.co/Kijai/MMAudio_safetensors/resolve/main/apple_DFN5B-CLIP-ViT-H-14-384_fp16.safetensors


    Nvidia BigVGAN v2 24KHz 100band 512x

    This seems to be required for MMAudio to work. You can manually download all the files, git clone, or use the HuggingFace CLI tool (huggingface-cli repo clone URL). The repo should be placed in the ComfyUI/models/mmaudio folder.

    https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_512x

    Bonus

    Once you've created a good MMAudio track, there are some further steps you can take depending on what you'd like to create.

    1. Import your audio/video into some type of software (CapCut/Shotcut) and layer on some music in the background. I've done this with a few of my videos. I added a 'radio' filter to make it seem like the music was kinda tinny and playing in the background.

    2. Layer other audio tracks alongside the NSFW audio track. You can see KaptainSisay very elegantly did something like that here (https://civarchive.com/images/110700679)

    Description

    RIFE Group output was set to 8fps by accident. Changed it to 24fps

    FAQ

    Comments (115)

    huguesbon590Nov 18, 2025
    CivitAI

    Great! Does it support low ram gpu config?

    SeoulSeeker
    Author
    Nov 18, 2025

    Depends on how much VRAM you have. I'm measuring like 2-4GB VRAM usage for the RIFE interpolation, and about 7-12gb VRAM usage for the MMAudio. That may vary a lot depending on your setup.

    But worth giving it a shot and see what happens. If you get an oom error, I might try to figure out how to release a low-VRAM version with some blockswapping or something.

    lollyjamNov 18, 2025· 1 reaction
    CivitAI

    The normal mmaudio models work fine, but for some reason the nsfw-gold model throws the following error:

    Error(s) in loading state_dict for MMAudio: Missing key(s) in state_dict: "clip_input_proj.2.w1.weight", "clip_input_proj.2.w2.weight", "clip_input_proj.2.w3.weight", "text_input_proj.2.w1.weight", "text_input_proj.2.w2.weight", "text_input_proj.2.w3.weight". Unexpected key(s) in state_dict: "clip_input_proj.1.w1.weight", "clip_input_proj.1.w2.weight", "clip_input_proj.1.w3.weight", "text_input_proj.1.w1.weight", "text_input_proj.1.w2.weight", "text_input_proj.1.w3.weight". size mismatch for t_embed.mlp.0.weight: copying a param with shape torch.Size([896, 256]) from checkpoint, the shape in current model is torch.Size([896, 896]).

    I'd appreciate any help!

    3621282Nov 18, 2025· 1 reaction

    Update Kijai's MMaudio custom node.

    lollyjamNov 18, 2025· 1 reaction

    @KaptainSisay Thanks, that fixed it!

    9517554Nov 18, 2025· 2 reactions
    CivitAI

    Thank you for introducing me to MMAudio. Sound makes videos so much better.

    SeoulSeeker
    Author
    Nov 18, 2025

    You're welcome friend!

    gkirNov 18, 2025
    CivitAI

    привет ! спасибо за фло. какой промт писать ? как озвучка работает?

    SeoulSeeker
    Author
    Nov 18, 2025· 1 reaction

    Подсказка должна описывать то, что происходит в видео. Вы можете использовать модели искусственного интеллекта, такие как Grok (который обычно не подвергается цензуре), и перетащить в него свое изображение. Попросите его создать аудиоподсказку на основе изображения. Кроме того, в зависимости от того, что происходит в видео, вы можете добавить в отрицательную подсказку вещи, которые не хотите слышать. Так что, если у вас есть видео, которое не является полностью NSFW (например, прямой секс), попробуйте добавить «стоны» или другие подобные вещи в отрицательную подсказку. Версия MMAudio для NSFW имеет тенденцию склоняться к сексуальным звукам, с которыми вам придется немного поиграть.

    Кроме того, я рекомендую запускать этап MMAudio как минимум несколько раз, так как результаты могут значительно различаться. Вы можете запустить его 4 или 5 раз и получить мусор, но в шестой раз результат будет действительно хорошим.

    gkirNov 19, 2025· 1 reaction

    @SeoulSeeker спасибо огромное , у меня даже без подсказки работает!

    SeoulSeeker
    Author
    Nov 19, 2025· 1 reaction

    @gkir Рад это слышать!

    borrawNov 18, 2025
    CivitAI

    10 tries, and the sound is always terrible and out of sync with the video. I hope they improve the model soon.

    SeoulSeeker
    Author
    Nov 18, 2025· 1 reaction

    Sorry it's sucking for you. Mind sharing some more details and maybe we can make it better together?

    ItsThatTimeAgainNov 19, 2025· 1 reaction

    Sounds like the video was not converted to 24fps for mmaudio

    borrawNov 19, 2025

    @SeoulSeeker maybe, but the model is bad at performing sex acts. Your workflow is good, and it has nothing to do with it😄

    kibermeNov 19, 2025· 1 reaction

    Same actually. Just soured my taste for couple of my favourite clips. Results are laughable and I was bound to listen to hot girls talking simglish for a while lol. The workflow is fine though.

    3621282Nov 19, 2025

    If your video is not 24fps, use the comfy load and save nodes option to force 24fps, otherwise it'll be out of sync.

    barrybelmontNov 19, 2025· 1 reaction
    CivitAI

    Thank you!

    SeoulSeeker
    Author
    Nov 19, 2025

    Thank you for your kind comment!

    piecesofshiNov 19, 2025· 1 reaction
    CivitAI

    Thank you, I've been testing it, and I've been laughing my ass off for a while with these results xOx

    SeoulSeeker
    Author
    Nov 19, 2025

    You can definitely get some hilarious outputs lol

    zczcgNov 19, 2025

    is really useful?

    SeoulSeeker
    Author
    Nov 19, 2025· 1 reaction

    @zczcg definitely! can take a bit of tweaking but if you want a proper video with audio, it's worthwhile

    zczcgNov 19, 2025

    @SeoulSeeker I test it, I have a good result,only the shortcoming is the voice sounds too old

    SeoulSeeker
    Author
    Nov 19, 2025

    @zczcg keep tweaking the prompts. try 'old' in the negative and add some more in the positive to push towards a younger voice

    piecesofshiNov 19, 2025· 1 reaction

    @SeoulSeeker Exactly, that's the easy fix, and putting the cfg at 8 reinforces the desired result.

    zczcgNov 19, 2025

    i will try it.thanks

    zczcgNov 19, 2025

    @SeoulSeeker add some more, you mean:"young voice" like this?

    piecesofshiNov 19, 2025· 1 reaction

    @zczcg use that and emphasize on it, and on the negative side put, " Old, Old woman," and the cfg on 8 and it should do the trick, also try the "fixed" seeds starting on 1 and so on, idk why but it helps a little more in some videos.

    uqwNov 19, 2025· 1 reaction
    CivitAI

    Works real good with 100 steps (takes only 17 secs on my 4060Ti)

    SeoulSeeker
    Author
    Nov 19, 2025· 5 reactions
    CivitAI

    Tiny update: the guy on HuggingFace who released this NSFW MMAudio model (Cloud19) is doing a new release with ThinkSound at some point. There apparently is an MMAudio v2, but the authors never provided the training script. Hopefully this new model is a bit more adaptable:

    https://huggingface.co/cloud19/NSFW_MMaudio/discussions/3#691a644baa12384dedc23638

    borrawNov 19, 2025
    CivitAI

    Can I ask what checkpoint/lore you used to create the preview images? I really like the composition and atmosphere.

    SeoulSeeker
    Author
    Nov 19, 2025· 1 reaction

    Forsure man, I'll upload the base images later today so you can see.

    SeoulSeeker
    Author
    Nov 19, 2025

    https://civitai.com/posts/24521458
    https://civitai.com/posts/24521548
    https://civitai.com/posts/24521624

    I hope this is the specific style you were talking about. Let me know if you need references for anything else or help with dialing in the style!

    ZalnorNov 20, 2025· 1 reaction
    CivitAI

    Excellent! Simple and effective!

    redlittlerabbitNov 21, 2025
    CivitAI

    I'm glad AI moves so fast so I don't have to look like an idiot for asking questions.

    1. This generates your video, and then also generates the audio?? Those sex noises were made with this?

    2. Is there a stand alone audio generator can feed pre made videos into?

    SeoulSeeker
    Author
    Nov 21, 2025· 1 reaction

    No question is stupid!

    1. This takes in a video you've already made and can do two things

    - Interpolate the frame rate from 16 to 24 fps (required for audio generation with MMAudio)
    - Generate an audio track to go along with your video

    2. I hope this answer ^ answers this question haha

    redlittlerabbitNov 21, 2025· 1 reaction

    @SeoulSeeker Well then you leave me no choice! I have to try it out.

    Thanks!

    tenstripNov 21, 2025· 4 reactions
    CivitAI

    75-100 steps does seems to be on average better and more responsive. From my use so far I've noticed the prompt adherence is there but not in the way that is super controllable. I've seen results from tokens like high pitched voice, heavy breathing, wet, sloppy, impact, forceful impact, closed mouth. Natural language doesn't seem to matter. Any kind of speech prompt or the name of a language produces artifacted speech. Negatives: music, muffled, low quality. Orgasm in the negative actually seems to cut down on a lot of moaning. Male voices can sometimes appear during a facial only. The input video matters the most and I've noticed you only get that skin contact sound and plops from very fast spurts of defined motion. I think the largest improvement comes from a further and more widely trained audio model, nothing to really extract from the workflow. If you get a few good outputs you can splice them in to something definitely usable and add foley on top of it, this is more of a foley clip generator anyways best used if you are going to postwork and edit videos. I've been adding pre-interpolated 50fps videos straight to the input and they seem to work as well as 24 fps, sometimes the sync needs to be adjusted +/- 3% to line up.

    SeoulSeeker
    Author
    Nov 21, 2025

    Thank you for your detailed comment and notes!

    DaAlbNov 30, 2025

    Hi, when you use 50fps in the input, what did you put in the force_rate? I am getting bad results with 32fps input.

    Thanks

    tenstripNov 30, 2025

    @DaAlb You need to force at 24fps or it won't work. I started exporting 5s segments at 24fps out of my video edits and using it that way only saving the mp3 result.

    DaAlbNov 30, 2025

    @tenstrip thanks a lot

    SailorofHaliNov 22, 2025· 1 reaction
    CivitAI

    Oh nice, already had my own mmaudio workflow but didn't know there was an interpolator that could go right to 24fps, I was reencoding my 32fps outputs and dropping frames to 24 before putting them into mmaudio before remerging the audio with the 32fps original.

    SeoulSeeker
    Author
    Nov 22, 2025

    I was doing the same thing before I found this RIFE node. The other, more commonly used node only does whole number frame interpolations (2x, 3x) which obviously doesn't work going from 16 to 24fps. Only other option I've found is to use Topaz (if you have it)

    bluenightlagoonNov 24, 2025
    CivitAI

    Is there a reason these files go into "/models/mmaudio" instead of "/models/audio_encoders"?

    SeoulSeeker
    Author
    Nov 24, 2025· 1 reaction

    No particular reason, it's just how Kijai configured the MMAudio node pack:

    https://github.com/kijai/ComfyUI-MMAudio

    bluenightlagoonNov 24, 2025· 1 reaction

    @SeoulSeeker ahh ok. Thank you. Looking forward to trying this :)

    bluenightlagoonNov 24, 2025· 1 reaction

    @SeoulSeeker Oh wow, this is one of the few workflows that actually works without big problems. This is awesome! So first step works really well. Increasing the frames of the videos is really easy to get running with this Workflow!

    The second function is almost as simple. There is one potential problem where comfyui can't find the mmaudio folder. Took me a while to figure that out, but that has to be added to the extra_models_config.json or extra_model_path.json. I think this is because I don't have comfyui on C:\ drive because I need more space for comfyui so I added to both files the line and after this and a restart it worked:
    mmaudio: Y:\programs\ComfyUI\resources\ComfyUI\models\mmaudio

    SeoulSeeker
    Author
    Nov 24, 2025

    @bluenightlagoon really glad to hear that, it's pretty simple and I want as many people to enjoy it as possible. And glad you got your issue figured out!

    bluenightlagoonNov 24, 2025· 1 reaction
    CivitAI

    Great workflow! One of the few workflows here that I actually got working as a comfyui beginner. Just started using comfyui for the first time 2 days ago :D I wish all workflows were this simple!

    Ponder_StibbonsNov 24, 2025· 1 reaction

    Most of the crazy all-in-one WFs that abound here are 90% bells and whistles, landmine bugs waiting to troll you. I'm guilty of this myself; it's not that automation isn't useful, it's just super intimidating the first time you wade in and nothing works. I promise you it gets easier. I know it's pedantic to say, but I didn't really understand comfy until I started building from scratch, or from basic skeletons. It really works. The core of all of them always boils down to a few key nodes. And you can dive into complex logic without tying your brain in knots over screwing up the python syntax. Or even knowing any python, for that matter. Also, as you build and test, you get an intuitive feel for the execution order, which was one of my biggest sources of confusion when I started. It's a great playground. Anyway, just wanted to encourage you to keep at it, it's such a great environment. Oh, if you haven't already- take a snapshot of your custom nodes, and make a backup of the whole comfy git folder (you can leave out models, input, output). When you inevitably need a clean install, having this prepared turns a nightmare weekend of forensics into a half-hour breeze.

    Ponder_StibbonsNov 24, 2025· 3 reactions
    CivitAI

    Thanks for posting. The output is better since the last time I played with it. The nsfw clip does a good job, even without a prompt. I'd still use a prompt though, unless I can't think of how to describe what I want. The 24fps requirement is really annoying but I've found a tolerable workaround for existing higher framerate clips (stuff that's already been interpolated). For anyone who is interested, my process for using this WF to add audio gen to finished 60fps videos is the following: throw the video into a saved premiere template sequence set to 24fps, slide the in/out bar to preset markers (10-15 sec intervals), quick export the sections to a folder and run a forloop batch (the kind I use in my reactor and controlnet batch WFs). Then the sections return to premiere, back to back, apply rate stretch to correct discrepancy (usually 8 frames or so) and some short crossfades. Done. You can do a lot of this in comfy, but if you're already experienced with editing software this isn't too much hassle. Of course if you're not starting with ridiculous 4 minute videos, you can just save a 15 sec 60fps clip to 24fps in shutter encoder or any other free software. Don't try meta batch manager; it won't work, you will just get one video repeating at the length of the batch you set, with the audio overwriting the first section. Thanks again for sharing, WF is short and sweet, and easily slipped into existing flows.

    SeoulSeeker
    Author
    Nov 24, 2025

    Thank you for your useful notes!

    aistorysimulatorNov 25, 2025
    CivitAI

    I downloaded one of your videos and dropped it into comfy and saw that your I2V model was locally named "wan22EnhancedLightning_v2I2VFP8HIGH", can I ask where you downloaded it from and which one it is? Thanks.

    SeoulSeeker
    Author
    Nov 25, 2025· 1 reaction

    Of course. I've been using the Enhanced Lightning high/low noise from taek75799 for a while because I'm exhausted from fucking around with individual lightning loras. So this has them baked in. Camera movement seems good and the output is pretty consistent. Here are the links:

    High noise: https://civitai.com/models/2053259?modelVersionId=2346136

    Low noise: https://civitai.com/models/2053259?modelVersionId=2346214

    SeoulSeeker
    Author
    Nov 27, 2025· 2 reactions
    CivitAI

    If anyone needs help with this workflow, please feel free to reach out via DM, I'm happy to help!

    foreversola777Nov 29, 2025
    CivitAI

    Where can I download RIFEInterpolation?
    By default, comfyui-frame-interpolation only includes the RIFE VFI (recommend rife47 and rife49) node.

    MahiroAGIDec 1, 2025· 2 reactions
    CivitAI

    man. This shit is good. Extremely simple and high quality. Thanks brother

    SeoulSeeker
    Author
    Dec 2, 2025

    You're welcome, enjoy!

    BackFox99Dec 2, 2025
    CivitAI

    im not sure if im doing this right but the output is always a black screen. it does give me audio of things but its a black screen does anyone know why?

    SeoulSeeker
    Author
    Dec 2, 2025

    Stupid question but did you drag your video into the 'Load Video' VHS node first before running MMAudio?

    BackFox99Dec 2, 2025

    @SeoulSeeker I did. But I got it to work! And its amazing! Do you know if there will be updates on this??

    SeoulSeeker
    Author
    Dec 2, 2025· 1 reaction

    @BackFox99 yay! And not to MMAudio itself because the author has never released the training script for V2. I left another comment earlier if you look for it, talking about the guy who made this NSFW finetune, releasing a new version based on a different model called ThinkSound. I check his hugging face repository quite frequently and haven’t heard any updates yet.

    AetherAiDec 2, 2025· 1 reaction
    CivitAI

    Thanks for the workflow it works great! Just one quick question since im not to good with comfy, is there a way to automatically use step 1 and 2? meaning the output video of rife going straight into the mmaudio in the same run. like using step 1 and 2 in the same run

    SeoulSeeker
    Author
    Dec 2, 2025· 1 reaction

    Of course. I just modified the workflow for you, should be just one run:

    https://pastebin.com/gDDHL8NG

    AetherAiDec 2, 2025· 1 reaction

    @SeoulSeeker Thank you so much for this!

    SeoulSeeker
    Author
    Dec 2, 2025

    @AetherAi My pleasure!

    AiddictedDec 4, 2025

    @SeoulSeeker I clicked the download button and it downloaded as a txt file? Is that supposed to happen or am I missing something?

    SeoulSeeker
    Author
    Dec 4, 2025

    @Aiddicted You should just be able to rename that text file as a .json file and then drag it into ComfyUI. Hope that helps

    AiddictedDec 4, 2025· 1 reaction

    @SeoulSeeker oh sick, The more you know 🌈

    AnomalousDec 7, 2025

    @SeoulSeeker Hello, I tried your modified wf and it seems to be outputting to temp/MMaudio instead of output\MMAudio. Any way to fix this? (I'm also a noob)

    SeoulSeeker
    Author
    Dec 7, 2025

    @Anomalous in the Video Combine VHS node, there's a section called 'filename_prefix'. That determines where the file will be saved. Hope that helps!

    AnomalousDec 7, 2025

    @SeoulSeeker Thanks for the response! I meant to say that it's outputting into ComfyUI\temp\MMAudio instead of ComfyUI\output\MMAudio. It seems to be using temp as a base folder no matter what i put under prefix

    SeoulSeeker
    Author
    Dec 7, 2025

    @Anomalous oh i'm sorry, now i understand what you're saying! at the bottom of the node, toggle 'save_output' on

    SeoulSeeker
    Author
    Dec 8, 2025

    @Anomalous Yes I think that's intended behavior, I remember seeing it too and looking at how to turn it off (because it creates a lot of redundant files) but never quite figured it out...

    AnomalousDec 8, 2025

    @SeoulSeeker Oh well, a minor inconvenience I suppose. My guess is that the audio sampler makes a copy of the input video then muxes it with the audio file it generates, but doesn't cleanup the duplicated video. Anyway, thank you very much!

    AiddictedDec 3, 2025· 3 reactions
    CivitAI

    HOLY SHIT! THIS IS A GAME CHANGER

    SeoulSeeker
    Author
    Dec 3, 2025

    hell yeah bro! enjoy :)

    AiddictedDec 5, 2025· 1 reaction
    CivitAI

    this thing is hilarious!! it makes characters speak simlish. i wonder if theres a way to get them to say actual words but simlish will do for now lol

    SeoulSeeker
    Author
    Dec 7, 2025

    I think the vanilla MMAudio can't produce proper language, so the nsfw finetune definitely can't lol

    Dano223Dec 7, 2025
    CivitAI

    Yoo thanks for this cool workflow really appreciate it, im having some troubles though, trust me when i say everything is 100% perfectly installed but when i put a video in ur workflow it pops it back up without sound. That is not good you know :( maybe its the fact that im just using step 5 and need to activate another step, anyway how to fix homie?

    SeoulSeeker
    Author
    Dec 7, 2025

    DM me so you can show me and we can fix it together

    tenstripDec 7, 2025· 3 reactions
    CivitAI

    Btw mmaudio synchformer runs at 25fps. Way better alignment at that rate.

    SeoulSeeker
    Author
    Dec 8, 2025

    Way better than 24fps do you think?

    tenstripDec 8, 2025

    @SeoulSeeker Sometimes the audio is still too fast for some motions but I only save the audio, and when it does sync up it's pretty exact and the audio clip is also the same length as the 25fps clip that I export and run through it.

    SeoulSeeker
    Author
    Dec 8, 2025

    @tenstrip gotcha. thank you, think i'll switch from 24 to 25 fps

    ValuedRenderDec 9, 2025
    CivitAI

    As I understand, I can simple use ffmpeg without interpolation
    ffmpeg -i input.mp4 -r 24 output.mp4

    SeoulSeeker
    Author
    Dec 9, 2025

    of course, but the ffmpeg command is cpu-based (slow) and struggles with fast motion. RIFE is gpu-accelerated and usually produces a better output overall

    ValuedRenderDec 11, 2025

    to use GPU with ffmpeg
    ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \

    -filter_complex "fps=30" \

    -c:v h264_nvenc -preset p7 -cq 19 output.mp4

    Even using cpu it works faster
    2 minutes video takes few seconds

    Thanks for information, I prefer 64 fps video, I don't need to do double interpolation coz

    Recommended FPS for MMAudio

    30 FPS — optimal

    25 FPS — acceptable (European standard)

    15 FPS — minimum, lower than this may cause poor lip-sync and motion analysis

    60 FPS — also OK, but the model will internally downsample the framerate to ~20–30 FPS.

    ❗ Important

    MMAudio is a multimodal audio-visual model, and it normalizes the video to its internal FPS, usually 20–30 FPS.

    Therefore:

    Converting from 60 → 30 FPS will not reduce quality for MMAudio.

    LastAssignmentDec 11, 2025

    @ValuedRender If you increase the framerate without interpolation then the video will just get sped up, that makes no sense.

    ValuedRenderDec 11, 2025

    @LastAssignment No it's not

    Changing the FPS of a video with FFmpeg does NOT slow it down or speed it up by default.

    When you re-encode a video and simply set a new FPS (for example, from 16 to 24 FPS), FFmpeg keeps the same video duration. It does this by adding or removing frames so the playback speed stays exactly the same.

    If you increase FPS → FFmpeg duplicates or interpolates frames to reach the new frame rate.
    Speed stays the same.

    If you decrease FPS → FFmpeg removes some frames.
    Speed stays the same.

    The only time FPS changes the speed is when you apply special options like -vf "setpts=..." or -r on the input, which explicitly modify the timing.

    So, simply changing FPS while re-encoding does not make the video faster or slower — it only changes how many frames it contains, not how fast it plays.

    ValuedRenderDec 11, 2025

    @LastAssignment I'm telling also about double interpolation in workflow example.
    Do you really need to do it twice ?
    I doubt I want to do interpolation for every generation at all. It's just wast of time.
    Better option do interpolation for good generations, and then when you have 40 or 60 fps videos, convertation to lower fps takes seconds even on cpu.

    LastAssignmentDec 11, 2025· 3 reactions

    @ValuedRender  Interpolation with the popular RIFE VFI node takes around 1 second on modern hardware (with this easy fix https://github.com/Fannovel16/ComfyUI-Frame-Interpolation/pull/102), it's a complete nonissue. Reencoding multiple times with FFMPEG just adds another step of quality loss. The best way is to just integrate everything to use the raw images/latents as much as possible in a single workflow to keep degredation as low as possible. You can disable steps inside the workflow if your only intent is to test stuff before upscaling/interpolation.

    If you already have an upscaled/interpolated video and just want to add audio then the best way is to down/upscale it to 25FPS and only use that interpolated video for mmaudio and combine the original files.

    Someone make a node to re-encode with ffmpeg inside comfyui, because I sure as hell wouldn't be doing that manually, way too lazy. Probably gonna vibecode one myself quick.

    ValuedRenderDec 11, 2025

    @LastAssignment > Reencoding multiple times with FFMPEG just adds another step of quality loss.
    What multiple times ? This can be done ( decreasing ) at the end for "good" generations, without losses for original video. Imagin how fast ffmpeg will be on 5090 RTX if you know how to enable GPU support.
    Just wast of time do interpolation for all generations. I'm done )

    ak4710315462Dec 9, 2025· 2 reactions
    CivitAI

    I also encountered Error(s) in loading state_dict for MMAudio: Missing key(s) in state_dict: "clip_input_proj.2.w1.weight", "clip_input_proj.2.w2.weight", "clip_input_proj.2.w3.weight", "text_input_proj.2.w1.weight", "text_input_proj.2.w2.weight", "text_input_proj.2.w3.weight". Unexpected key(s) in state_dict: "clip_input_proj.1.w1.weight", "clip_input_proj.1.w2.weight", "clip_input_proj.1.w3.weight", "text_input_proj.1.w1.weight", "text_input_proj.1.w2.weight", "text_input_proj.1.w3.weight". size mismatch for t_embed.mlp.0.weight: copying a param with shape torch.Size([896, 256]) from checkpoint, the shape in current model is torch.Size([896, 896]).My Sage Attention Is 2.1 Will this cause an error?Please Help Me

    logmarch2948Dec 18, 2025

    I got this too. Any fix?

    daspin335Dec 26, 2025

    @logmarch2948 same

    daspin335Dec 27, 2025· 1 reaction

    solved:

    **error was outdated nodes from nodes list tab, update nodes, double check the model name, if still persisting, rename the main mmaudio model to something else and test it again after making sure your relevent nodes are up to date

    SeoulSeeker
    Author
    Dec 27, 2025

    @daspin335 Thanks for posting your solution. Hopefully this helps everyone else with the same error

    sushesousuo140801Dec 10, 2025
    CivitAI

    NICEJOB!How long a video can this workflow process at one time?

    SeoulSeeker
    Author
    Dec 10, 2025

    I think it was trained on ~8 second clips, so beyond that will probably fall apart.

    sushesousuo140801Dec 11, 2025

    @SeoulSeeker If I want to work with longer clips (over 8 seconds), do you have any tips to keep the timbre consistent in the video?

    SeoulSeeker
    Author
    Dec 11, 2025

    @sushesousuo140801 Unfortunately that's outside my realm of knowledge as I work exclusively with WAN 2.2 I2V and am bound by its video length constraints.

    ValuedRenderDec 11, 2025

    @SeoulSeeker
    May be using fixed seed for all chunks will help.

    I'm trying clone of repo localy, with spliting incoming video by 8 seconds, processing and put it together.

    ValuedRenderDec 11, 2025

    @sushesousuo140801 This model phazei/NSFW_MMaudio produces very poor results. It makes anime sounds, noise, and so on. Everything is so bad that even short videos are unlistenable ( for me). Have to wait for better trainer )
    Better option is to train on own data.

    ValuedRenderDec 11, 2025
    SeoulSeeker
    Author
    Dec 11, 2025

    @ValuedRender it's the only nsfw finetune i've found so far. he's training another 40 hour dataset through ThinkSound but it's yet to be released. For normal length clips, as long as you toy with CFG/prompt/steps, phazei's is great. Nothing's perfect. I'd love to see your own trained model bud

    ValuedRenderDec 11, 2025

    @SeoulSeeker I wish I had time for this ) 80% of success is dataset. pre-processing, noise removal this will take much more than 40 hours )

    ValuedRenderDec 11, 2025
    CivitAI

    By the way
    CLIP: 8 FPS
    Synchformer: 25 FPS

    It makes more sense to do the interpolation at 60 fps once and pass 60 fps, it should be automatically adjusted by model

    NOTE: It takes longer to process high-resolution videos (>384 px on the shorter side). Doing so does not improve results.

    Also make sense reduse size

    LastAssignmentDec 11, 2025· 2 reactions
    CivitAI

    You should switch your video upload nodes to the "Load Video FFMPEG" Node to avoid color drift, it's a massive difference. You should also try to avoid reencoding videos (uploading videos too many times) as much as possible to prevent any kind of degredation.

    SeoulSeeker
    Author
    Dec 11, 2025· 1 reaction

    Thank you for those tips, I could never figure out the color drift issue! Will update the workflow accordingly.

    Shamino_The_AnarchDec 12, 2025

    @LastAssignment They were and this is exactly the reason: i noticed it is the frame_load_cap which changes depending on mode and is less than the source frames. For a short clip I get 120 frames with AnimateDiff setting, 117 with WAN and IIRC 115 with LTXV. I guess there is a reason for it. But then again I created that clip with WAN in the first place, so why is the output not 117 frames in total and is 120 instead... maybe ffmpeg needs frames to be a multiple of a certain number and pads it on file creation?

    LastAssignmentDec 12, 2025

    @Shamino_The_Anarch I'm honestly confused as well :)

    Workflows
    Wan Video 2.2 I2V-A14B

    Details

    Downloads
    3,338
    Platform
    CivitAI
    Platform Status
    Available
    Created
    11/18/2025
    Updated
    6/14/2026
    Deleted
    -