This workflow supports 3 types of models currently:
Standard LTX 2.3 distilled
LTX 2.3 distilled GGUF
10Eros
💡 The models are self-contained. You can safely delete the entire group of whichever model you don't use without breaking the workflow. The remaining model groups will work independently without any additional changes needed.
This workflow is a modular and flexible text/image/audio-to-video generation system built in ComfyUI, designed to give full control over video creation using LTX-based models. It allows you to easily mix and match multiple generation modes such as text-to-video, image-to-video, lipsync, and fully guided animation by enabling or disabling grouped nodes.
📝 Personal notes:
The 10Eros model is better for NSFW content, whereas the standard model is better for SFW generations, although the body movement of the 10Eros model can be beneficial in some cases for SFW content too, but in general, use each model as I just said.
Try to always use 2 phase sampling generations (Half res + 2x upscaler), this yields the best quality and character consistency, LTX is not good at all at preserving character ID, so don't make it worse by doing a single pass generation. The upscaler model adds extra detail and improves character consistency, that's why I recommend using it.
Don't use the detailer when generating "Amateur look" videos, it adds a light layer of detail to the final result, and most of the time it will look too "polished" for a real amateur recording; amateur style videos look more real when they look low quality.
Main features
GGUF support
Prompt relay for segmented prompts
NSFW prompt enhancer
Text, image, audio, and ControlNet-driven video generation
LoRA support (character, style, and voice via ID LoRA)
Custom or AI-generated audio with automatic syncing
Reference image + up to 7 keyframes (FFLF animation control)
ControlNet video guidance with hybrid reference support
Half-res sampling + 2× upscaling for faster high-quality results
LTX detailer for enhanced final output
Common Setups
Text to video:
All bypassers disabled + Prompt + Default audioImage to video:
Prompt + Reference image + Default audioLipsync:
Prompt + Reference image + Custom audioAudio to video:
Prompt + Custom audio onlyCharacter LoRA + voice cloning:
Prompt + Character LoRA + ID LoRA + Default audioVoice reference to video:
Prompt + ID LoRA + Default audio
OR
Prompt + ID LoRA + Reference image + Default audioCharacter animation:
Prompt + ControlNet + Reference image + (Custom or Default audio)First frame → last frame:
Prompt + Keyframe 1 + Keyframe 2 + (Custom or Default audio)First → middle → last frame:
Prompt + Keyframe 1 + Keyframe 2 + Keyframe 3 + (Custom or Default audio)Character animation with custom voice:
Prompt + Reference image + ID LoRA + ControlNet + Default audio
Detailed instructions are contained in the workflow itself:
Red nodes are instructions and useful notes.
Yellow nodes are configurable elements you can adjust to your needs.

Description
- Added LTX2.3 1.1 support.
- Added Prompt relay support.
- Added extra keyframes (now 8 in total).
- Enhanced the upscaling process; now all the keyframes are taken as reference for upscaling, not just the first one.
FAQ
Comments (44)
V1 is my fav Ltx workflow, by far. now that V2 is out, I'm excited to try it! TY!
Thanks, I don't remember if it was you whom suggested using the keyframes to upscale and don't loose quality, but if it was you, thanks again hehe
@LatentHeart Yes but I didn't want to trouble you so I deleted the comment, lol. Thanks a ton :)
is it possilbe to save controlnet "movements" and load them after they got generated? or do I need to everytime regenerate even if it is the same video input?
Yes you can, you will need to modify this workflow to achieve that but any workflow that takes a preprocessed controlnet input can work like you describe. In this workflow for example, in the controlnet group, you can see the preproccesed input is coneected to a "Resize Image/Mask" node, right after the control net type selector switch; well, you can bypass all the nodes between the "Load video" node and that node if you are directly loading a preprocessed controlnet video.
Hello! Maybe my comment isn't really related to this WF, but I need someone to help me find a WF for Video to Video, please! I'd really appreciate it if someone could help me! 😊✨
@FlowSpecial Wow! I'll give it a try as soon as I can. Thank you so much! 😍✨
Kind of a Noob question, but for the life of me, I can't find the config file mentioned
The workflow is a JSON file, CivitAI auto detects that type of file as a "Configuration file"; but that doesn't matter, you download the json file and drag and drop it into ComfyUI. Now, not trying to be mean or anything here ok? but if you are starting using ComfyUI, perhaps this workflow could be too advanced for you, for starters, you will need to download the model files, and that will require you know what's best for your speficic setup. You will also need to clone the prompt relay repository from GitHub, and possibly troubleshoot things here and there if you install some custom nodes.
@LatentHeart I totally understand now. Thank you for the well thought out explanation. I appreciate it!
please help, frame relay alone does't work
You mean prompt relay? Did you clone the Github repo?
I just started using this workflow instead of my incredibly jank hodge podge of a ltx 2.3 workflow.
(mind you it works fine just uh....spaghetti lmao)
Man I never thought about using Mel-Band Roformer to split the audio from music and then just simply using the original audio to combine back into the video....... i was manually adding the audio back to the already completed video via a dedicated workflow afterwards XD
ive had melband for quite awhile but never used it much aside from sunoai
hehe You can also combine the split voice audio with sony whoosh, for higher fx audio quality ;)
What is supposed to go in the REFERENCE IMAGE SIZE node? It shows 1920 by default.
Thank you for the WF!
You can leave it as is, it is the resolution of all the keyframes, including the reference image, it serves as a "safe limit" in case you are loading huge images, they get automatically resized (by the longest edge); you can lower it to save some VRAM if you want, 1280 (720p) should yield good results too. The lower you go, the lower details the model has to work with.
@LatentHeart Awesome. Thank you for taking the time to explain!
is there a lipsync option? i add my speech but the subject doesnt talk
Yes, you need to enable the "Custom audio source" group using the fast bypasser nodes. There are two groups where you can add a source audio, one is the ID LoRA group used for voice cloning, the other one is the Custom audio source group, used for lipsync. Make sure you are enabling and adding your audio in the Custom audio source group, not in the ID LoRA group. In said group, there's a "Preview audio" node, make sure that the voice is coming out, if it doesn't, use a better quality audio file. Also, some times, if the audio is of certain quality or tone, the LTX model will interpret it as an off camera or a "thought" voice, so you can add the phrase "perfect lipsync" to your prompt to force the model to behave as intended.
Great work! It would be nice if you have a option to make Eros and Sulphur work within your workflow which currently gives burned results. Works fine with the suggested checkpoint though.
Thanks. If those models are compatible with IC-LoRA, ID-LoRA and prompt relay, you can simply swap the VAE, CLIP and MODEL nodes for the ones that you need to make those model work and it should work just fine, everything you need to wire is in the "Models" group.
@MrToon Finally figured it out. I described what I had to change to make it work in this post: https://civitai.red/posts/28586183
@craempei Thanks for sharing ;)
so if i had a person as the main reference image, a pic of a red tshirt as a second ref image and a logo as the third ref image, how would i prompt it to make the person wear the red tshirt with the logo on it? my experiments arent working.
This workflow has no editing capabilities, sorry.
@LatentHeart i'm trying to understand what this is about. i presume i would load a ref image, a ref image 2 and have them appear in the same video. how would i prompt for that?
@dukecity258 No, this workflow is capable of doing all sort of stuff except for editing.
The reference images on this workflow serve as keyframes, frames that will appear in the final video. This means in simple terms that the reference images you include are used as "scenes".
What you can do is edit the reference image with a workflow like this one:
https://civitai.red/models/2579807/edit-workflow-2-in-one-flux-klein-9-qwen-edit-2511-sam3
with a prompt like "The person in image 1 is wearing the t-shirt from image 2, with the logo from image 3 printed on the chest area"; and then you can use the resulting image as your reference image for a video.
@LatentHeart Thanks for the explanation! cool wf!
It looks some nodes are not connected: LTXVConcatAVLatent.execute() missing 1 required positional argument: 'video_latent'
The LTX sampler subgraph doesn't have positive/negative/model/video latent connected. Is this intended? I'm pretty sure I did a fresh download and the connections are missing.
Do you have the latest ComfyUI version?
@LatentHeart oh wow 0.21 literally dropped yesterday. Let me try that. But it's odd that the nodes now don't require inputs.
@timammasd Yes, that strange, I haven't heard no one complaining about the sub-graphs so far
@LatentHeart Same thing. Fresh download. The LTX sampler subgraph in the main sampler section only has audio latent and noise input set. All other inputs are missing.
@timammasd Ok I honestly don't know what could be the problem, but I can guide you so you can connect what's missing:
Inside that graph, the connections should look like the first image, and outsite the subgraph, the connections should look like the second image:
https://imgur.com/a/DNiZfye
If you open the subgraph and it looks empty, use the center workflow option to find the nodes, some times they are not in the same position as the main workflow view.
For what you describe, the missing connections are:
- Positive
- Negative
- Video_latent
- Model
So you inside the subgraph, you only need to connect those, for example. for the positive input, you can drag the positive input of the "CFGGuider" node to one empty input node of the subgraph, and then you can connect what needs a positive input (LTXVCropGuides), and so on.
BTW, don't worry about the order of the inputs, the important thing is every connection goes to the right nodes.
Just so you know, I just downloaded the file and check it, it works fine on my end.
@LatentHeart Thanks for checking. I believe there is something wrong with rgthree. It's just randomly missing connections all over the places related to RGTHREE_CONTEXT. What's your rgthree and comfy UI version? The worst case is I manually wire things up according to your screenshot above...
@timammasd ahhh ok, my rg-three version is:
1.0.2512112053
ComfyUI: v0.20.1 (2026-04-27)
Manager: V3.39
BTW, not all the Context node connections are used, mainly the positive and negative guides. latent and model.
@LatentHeart Reverted to the same version still got the same. When I turned on rgthree's corrupt workflow detection, it complained "The workflow you've loaded has corrupt linking data that may be able to be fixed.Open fixer | Fix in place" but it wasn't able to fix it. This is very very odd. I even inspected the source. I do see link IDs for LTX sampler subgraph's input.
@timammasd Well honestly I got no idea what could be wrong, sorry :'(
@LatentHeart Can you do me a favor and run rgthree's corrupted workflow fixer to see if it complains on your side?
@timammasd Done, droped the file I downloaded a few minutes ago:
⚠️ Found 0 links to fix, and 1 to be removed.
And just as a side note, I'm using the portable version of ComfUI
"I thought the missing links between nodes in the downloaded workflow file were some kind of gate the author set on purpose, because their reply said that using this workflow requires some troubleshooting skills. I spent forever connecting the nodes by following the example image — gotta say, it did help me understand the workflow better, but some problems still leave me clueless. For example, when doing i2v, I get things like 'TypeError: CFGGuider.execute() missing 1 required positional argument: 'model'' and 'graph.DependencyCycleError: Dependency cycle detected'. I feel like these issues are caused by incorrect node connections. And there are just too many models using SET and GET nodes — MODEL_BASE, MODEL_EXTENDED, EROS_SAMPLING_MODEL, MODEL_LORA, MODEL_FULL, etc. All of this confuses me a lot. It'd be awesome if you could make a tutorial explaining the different connection methods and how to distinguish between things like Standard LTX 2.3 distilled, LTX 2.3 distilled GGUF, and 10Eros, so we can avoid mixing them up."
@LatentHeart "But the error shows up inside the subgraph, and the missing links are on the outside