CivArchive
    Hunyuan Video / Wan LoRA training toolkit - Caption videos
    Preview 50953761

    Preparing datasets for LoRAs can be a pain in the ass. Being lazy, I attempted to automate as much of the process as possible.

    First, I have a workflow that extracts videos from a folder, and converts their fps to a target framerate of your choice (default 24 fps for Hunyuan). To do that, it takes the least common multiple "lcm" of the original and target fps (for example, for 30 and 24 it's 120), then it uses FILM to interpolate up to the lcm of the two numbers, and then it keeps only the frames it needs to go back down to the target fps. It's a bit overkill, but if you computer can handle it, it can save a bit of hassle.

    I had to create a custom node for the math, here it is if the manager can't find it: https://github.com/EmilioPlumed/ComfyUI-Math.

    Second, I have another workflow that captions the videos in a folder. I have it set up to use Joy Captioner alpha 2 to get a verbose description, and 2 wd14 taggers to get tags from different points. Each of the 3 captions have a slider to select from which portion of the video to extract the description or the tags. Then, it eliminates duplicate tags and puts the text together in a .txt file.

    The workflows do require you to rename your files to be numbered consecutively.

    Description

    FAQ

    Comments (9)

    midiaplaayJan 11, 2025
    CivitAI

    Great workflow. I will try in mubusi tuner.

    SingularUnityJan 12, 2025
    CivitAI

    What's the lowest VRAM you've been able to run it with? currently using a 3090 with 24gb and wouldn't mind giving this a try if my system can run it.

    bonetrousers
    Author
    Jan 12, 2025

    I have a 4090 with 24 GB. The Joy caption model pretty much fills all the vram, but you should be fine. The interpolation workflow is light on the vram, but for longer videos with unusual fps (like 25) will end up taking a lot of system ram, I upgraded recently and 32 GB wasn't enough in those cases.

    SingularUnityJan 12, 2025· 2 reactions

    @bonetrousers thanks for the info. I currently have 64 gigs of system Ram so I will give this a shot

    tensor_fanaticJan 17, 2025· 3 reactions
    CivitAI

    You need to install the "python-interpreter-node" from comfyui manager for this to work

    MystosJan 29, 2025

    Thanks, Comfy Manager didn't catch this.

    SteveWarnerJan 20, 2025· 2 reactions
    CivitAI

    From one lazy person to another, let me say THANK YOU!

    midiaplaayJan 28, 2025
    CivitAI

    This workflow works with only images or mixed one?

    bonetrousers
    Author
    Jan 28, 2025

    As it is currently set up, you'd have to separate your videos from your images and run them in different workflows.

    I'll add an example workflow for captioning single images.

    Workflows
    Other

    Details

    Downloads
    1,832
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/11/2025
    Updated
    5/13/2026
    Deleted
    -

    Files

    hunyuanVideoWanLora_captionVideos.zip

    Mirrors