CivArchive
    Wan.Humo Music Video Automation Workflow. - v1.0
    NSFW

    🎬 AI Music Video Workflow (ComfyUI)

    Turn your favorite tracks into fully AI-generated cinematic music videos — automatically right inside ComfyUI - NO POST EDITING NEEDED.
    This workflow takes a reference image and an audio file, then generates a lip synced video that matches lyrics, mood, and scene dynamics and is 95% fully automated.

    For some reason most of the example videos are not showing up, guessing they are too long. You can find them all here

    High Level walkthrough can be found here

    Need help or have questions? Please reach out through discord


    ✨ What It Does

    • 🎭 Keeps your reference image as the main performer across all scenes.

    • 🎶 Splits audio into lyric-synced snippets for perfect timing.

    • 🖋️ Uses a custom Prompt creator node that sends custom instructions to an LLM node to build cinematic prompts from the lyrics and your style choices.

    • 🎥 Generates scene-by-scene visuals, then combines them into a seamless final video.

    The samples I provided were all created inside ComfyUI with NO post edits.

    On a 5090 it took around 2 hours for the full song.

    More examples can be found here and more will be added as I make them.


    đź”§ Key Features

    • Reference Image Control – Import your character photo (headshot recommended) it auto-removes the background, and resizes for clean framing.

    • Audio Handling – Automatic vocal/instrument separation, Whisper V3 transcription, advanced settings for lyric overlap, and fallback options.

    • Prompt Creator – Flexible scene builder with fields for style, theme, lighting, camera motion, outfits, and more to get a custom look

    • Auto Queueing – Handles multi-run videos seamlessly for long audio files.

    • Final Render Automation – Collects all video chunks, merges them, and saves your finished video as FINAL_VIDEO.mp4.

    • This workflow uses the Native Gemini LLM API node by default, which receives detailed instructions generated by the Prompt Creator node. You can swap Gemini out for another LLM if you prefer, but the instruction sets are fairly complex, and most local models struggle to follow them reliably. If you’d rather not use an LLM at all, you can manually enter prompts instead—just reach out on Discord for extra guidance and tips. For context, I’ve spent only $5 so far, which has powered 50+ videos, and I still have credit left—so it’s been very cost-effective.


    🚀 Quick Start

    1. Upload reference image

    2. Load your audio file

    3. Set your folder name (e.g., the song title).

    4. Fill in Prompt Creator fields (style, mood, shots, etc.).

    5. Hit Run — everything else is automated.

      • The workflow will auto-queue middle runs for long audio files.

      • For the final pass, it will tell you which groups to mute.

      • Simply follow the on-screen instructions, hit run again, and the workflow finishes the process automatically. (You do not have to wait for runs to finish. You just mute and hit run once more.)


    🎵 Creative Workflow Tip

    Just like real music videos, you don’t have to stick to one pass. You can run the same audio file multiple times with different reference images or styles — for example:

    • One pass with the lead singer as the performer.

    • Another pass featuring a band member or supporting character.

    • Additional passes experimenting with different themes, outfits, or camera styles.

    Later, you can edit these separate video runs together, cutting between performances or blending visual moods — exactly how professional music videos are produced with multiple takes.


    📦 Required Custom Nodes

    This workflow relies on a set of custom nodes I built specifically for this workflow.
    You’ll need to install them before running the workflow:

    👉 ComfyUI-VRGameDevGirl Custom Nodes (GitHub)

    They can also be installed via the manager.

    These nodes handle:

    • Audio splitting, transcription, and auto-queueing

    • Smart folder management and metadata tracking.

    • Popup instructions for multi-run projects.

    • Scene sync and frame adjustments for HuMo compatibility.

    • Video combining and more.


    👉 Join the discord community for support, tips and tricks.


    âś… In Summary

    This workflow is designed for creators, musicians, and visual storytellers who want to merge AI visuals with music. With automatic transcription, smart prompt handling, and seamless video assembly, you can focus on creative direction while the workflow handles the heavy lifting.

    Description

    FAQ

    Comments (15)

    dannyboy33Oct 23, 2025
    CivitAI

    That looks amazing, I want to try it out but cant spend the cash, will try to switch the Gemini node to a local LLM node, wish it works :)

    vrgamedevgirl
    Author
    Oct 24, 2025· 1 reaction

    Its sooo cheap. $5 will last you many, many videos. I put in $5 and made over 50 videos. local models are not smart enough to handle the complex instructions. You could manually prompt though, just find the text encoder nodes in the groups, unpin them, expand them and unplug the noodle from the input. Then you can just disable auto Q and run each set manually using manually prompts or prompts from chat GPT. I'm working on some new nodes though that will allow you to just use GPT's and won't need an LLM. Just an extra step or two but works.

    dannyboy33Oct 24, 2025· 4 reactions

    @vrgamedevgirl It is cheap, but I'm cheaper :) made a pledge to myself to try to do everything locally after spending too much on GPU. anyways, your'e right about local models, I'm using qwen3-4b to keep as much VRAM available and pushing it twice (second time asking it to adhere to formatting) seems to work OK. This workflow is great though, really inspiring me to try to build some of those longer WF's i've been to lazy to build.

    kroms50Nov 3, 2025

    @vrgamedevgirl would love a gpt version of this - interesting concept :)

    veldierinNov 27, 2025

    @dannyboy33 could you share your workflow for this? i would prefer using a local model as well.

    vrgamedevgirl
    Author
    Nov 28, 2025· 1 reaction

    @veldierin The workflow is in the workflows folder when you get the custom nodes. - There is a new manual workflow that does not use an LLM but instead you can use GPT's i created. There is the free version of chat GPT.

    kallamamranOct 24, 2025
    CivitAI

    Incredible work. The workflow is something of a mess when not having done it myself. Finding my way around is hard, but man... This is very good!!!

    vrgamedevgirl
    Author
    Oct 24, 2025· 1 reaction

    Thanks!!! Also, If you follow the video walkthrough its very easy. Its not a mess per say its just very complex.
    and you don't have to even look at most of the nodes. Just steps 1-4 and then one or two nodes need to be touched. Its a well organized workflow, you just need to watch the walkthrough video.

    kallamamranNov 21, 2025

    @vrgamedevgirl Love it :)

    JamesBandOct 28, 2025
    CivitAI

    Thats a revolution Dorothy ! About the batch image node for the Gemini input, do we have to create one or is it somewhere in this oceanic workflow ?

    vrgamedevgirl
    Author
    Oct 28, 2025· 1 reaction

    you don't need to connect any images to the gemini node. You don't really have to do anything besides the main ref image, song and folder, then make sure you mute groups when needed. i would reach out on discord for support as i'm there pretty much every day.

    the link to the server is in the description.

    and thanks!!! :)

    kroms50Nov 3, 2025· 1 reaction
    CivitAI

    This is so epic, thank you for spending the time to make it. I do have an issue, no matter what I do I can't seem to find the GeminiNode - I've even installed a few off of github but to no avail. I'd like to try running this from a local LLM - and I'm not good at comfyui nodes (I'll break it) - how would I use a local LLM with this? Much appreciated and than you!

    vrgamedevgirl
    Author
    Nov 4, 2025

    Hey! I'm guessing you need to update comfyUI to get the gemini node. You can use other LLM nodes but the instructions are very complex and most smaller models just can't handle it. I would recommend reaching out to discord, link to server in description.

    7180347Jan 30, 2026
    CivitAI

    whoa! how did i miss this. this is so promising, i hope it works for my setup!

    oceanvaillapel970755Feb 14, 2026
    CivitAI

    nevermind figured it out, only problem now is im not using a llm currently, but it would be awesome to use llm studio if that was possible and just have the gen pause after it generated all the promps so you can dump it from memory but anyways can't figure out how to edit promps without a llm.

    Workflows
    Other

    Details

    Downloads
    618
    Platform
    CivitAI
    Platform Status
    Available
    Created
    10/19/2025
    Updated
    5/13/2026
    Deleted
    -

    Files

    wanHumoMusicVideo_v10.zip

    Mirrors

    CivitAI (1 mirrors)