MMaudio-test1
You can use this with: https://civarchive.com/models/2483045/mmaudio-adapter-comfyui
It's recommended to always include "birdsong, music" in the negative prompt.
While it will work with just the basic base prompt "nsfwtest," you'll need to put what you want to remove in the negative prompt. For example, if your positive prompt is "moaning, voice" and your negative prompt is "talking, birdsong, music," you're more likely to get only female moaning sounds.
"young girl" is an option to make the female voice sound younger.
Description
FAQ
Comments (15)
we need a local one xD
I'll upload it now, so please try it out. I'm not very familiar with comfyUI, so there might be bugs in the workflow.
https://civitai.com/models/2483045/mmaudio-adapter-comfyui
This is a workflow for comfyUI. Please try it.
It seems impossible to get the girl to make sounds through prompts alone without using a LoRA.
I'll also create a local version of the Lora creation machine soon! I'll update the Google Colab version to support it as well.
@diffusion078746 When training the LoRA, did you only use a girl's voice, or did you include other sound effects as well?
@TohnoAkiha
My dataset is trained using MP4 files with audio and text files.
Sound effects are not separated. If music or other sound effects are included, separating them in the text file (e.g., music, sound effect) would make it easier to identify negative prompts.
Writing the text file in more detail might yield better results, but I haven't tried that yet.
@diffusion078746 My audio files currently only feature girls' voices, and each one is over 20 seconds long. Is it mandatory to use audio clips that are under 8 seconds?
@TohnoAkiha It's not mandatory. In fact, MMaudio seems to prefer longer audio clips.
@diffusion078746 Does the training data need to include video, or is audio-only sufficient?
@TohnoAkiha I'm not entirely sure myself, but I think it's probably possible to link words to learning using only audio (for example, if you want to teach the character's voice).
However, if there's a strong connection between the audio and the video content, such as the atmosphere of the moment, then having the video would be better.
@diffusion078746 It failed. The trained LoRA has no sound; it might be because my audio data didn't include any visuals.
@TohnoAkiha I see. Thank you for the review.
When I have time, I'll consider doing audio-only training.
Also, I think it would be better to have keywords for Lora itself.
How does this compare to the mmaudio large 44k nsfw gold 8.5 final fp16 lora? Is this based off of that one?
This model is close to a test model, so a more precisely trained model is recommended.
This Lora is a preview for these (https://civitai.com/models/2483045/mmaudio-adapter-comfyui, https://civitai.com/models/2487771/mmaudio-adapter-trainer).
Use this if you need a specific MMaudio Lora.