(NSFW) MMAudio ComfyUI Use Ultralytics to crop parts of the video Experimental PoC - v0.9

NSFW

Experimental workflow Ultralytics detector and MMAudio.

You will need only 4 custom nodes you probably already have

ComfyUI Impact Pack

rgthree-comfy

ComfyUI-VideoHelperSuite

ComfyUI Impact Subpack

If you want a good and simple MMaudio Workflow @SeoulSeeker made a very good one you can try.
https://civarchive.com/models/2137833/nsfw-dead-simple-mmaudio-rife-interpolation-setup-for-wan-22-i2v-14b

My workflow dosen't have an interpolation parts, all my videos are already at 24 FPS
So make sure yours too.

Few months ago I was struggling with MMAudio, the models itself use the motion, face of your whole video to generate the sounds, so most of the time if there is too much going on, the models get pretty confused, so first I'v started to experimenting with DAVinci and cropped then zoom the part of my video, like the face to have good synced moaning for example.
But the process was tedious.

So my idea was how can I crop it direclty into ComfyUI, I was thinking of mask first but you will need to redraw it everytime depending of the composition.
So my knowledge was limited with automated mask draw and get rid of this idea.

And then last week it click, "why can't you use ultralytics to detect the face or the area around it or the hand/ass etc.. give this cropped parts to MMAudio and generate sound for it"
So after few try with SEGS for video detector, I found I think a very good way of doing it, it take a bit longer yeah but the results is pretty good.

It is not perfect off course, MMAudio will do his random shit but it is way better on some of my videos, some videos dosen't need it, like a video where most of the subject and action is centered.

How to Use

So here you can choose your BBox detector, I'v tried with Anzhc face, yolov8m/s/n, yolov9c and the results will be slightly different for each of them, but you can use pretty much every face detector you want.
You can also play with crop factor, I found 1.2 is a good ratio.

With the video Combine preview you can see if the detection was good, this part will be send to MMAudio sampler.
When the detection is done there is no need to touch the SEGS, so you can generate a bunch of try with differents prompt and seeds.

I added a 2nd detector if you want to detect something else or even use MMaudio classic model to add some other sfx, you can toogle it on/off with Fast GroupBypasser.
If you want to generate a bunch of tries with only the 2nd detector just fix the seed on the first crop.
Make sure to let the detection enable on the 2nd crop if you want to only generate on the first MMAudio, then you can fix the first and play with 2nd MMAudio.
I'v added a preview audio for the 2nd Crop.

Here you can play with volume of the 2nd MMAudio to have a better Mix

I'v also added a Master Volume and First crop Volume

I hope I was clear enough if you have any trouble operating it or questions or maybe I missed somethings just let me know.

Description

V0.9 experimental

FAQ

Comments (4)

TetsuooApr 13, 2026

CivitAI

Works pretty well (there was a bug with MMaudio tho), thanks for sharing.
Now I wonder if this is good only for porn or there are other uses for it ? Not that I'm complaining lol

PS: Nobody seem to use that for some reason: with RGthree installed you can go to rgthree Settings>Groups>Show fast toggle in Group Headers

Author

Apr 13, 2026

Thanks, do you mean some of the connection was bugged or just mmaudio doing weird shit ?

I guess you can you use it with the classic SFW mmaudio model also, I'v tried a bit with it to make some laughs but the characters was only speaking gibberish, maybe my prompt was not good enough for it.

I didn't know this option with rgthree.

TetsuooApr 13, 2026· 1 reaction

@HmNike MMaudio was stuck and couldn't download some stuff, this post solved it : https://github.com/hkchengrex/MMAudio/issues/98
According to Perplexity AI it was caused by an incompatibilty between MMaudio and hugginfacehub, something like that. Classic painful bugs that are impossible to fix for the common mortal x)

Author

Apr 13, 2026

@Tetsuoo Ah ok, I think I'v downloaded everything manually when I installed it few months ago, thanks for sharing the fix here.

Workflows

Wan Video 2.2 I2V-A14B

Download (Beta) View on CivitAI

Details

Downloads

404

Platform

CivitAI

Platform Status

Available

Created

4/12/2026

Updated

7/27/2026

Deleted

-

Files

NSFWMmaudioComfyuiUse_v09.zip

Size:

5.98 KB

SHA256:

38b37d7dd1714eef7cda030cf6c0462f4804afb6af5a9910c94e2e0c15856b8a

Mirrors

HuggingFace (1 mirrors)

NSFWMmaudioComfyuiUse_v09.zip

CivitAI (1 mirrors)

NSFWMmaudioComfyuiUse_v09.zip