This workflow supports 3 types of models currently:
Standard LTX 2.3 distilled
LTX 2.3 distilled GGUF
10Eros
💡 The models are self-contained. You can safely delete the entire group of whichever model you don't use without breaking the workflow. The remaining model groups will work independently without any additional changes needed.
This workflow is a modular and flexible text/image/audio-to-video generation system built in ComfyUI, designed to give full control over video creation using LTX-based models. It allows you to easily mix and match multiple generation modes such as text-to-video, image-to-video, lipsync, and fully guided animation by enabling or disabling grouped nodes.
📝 Personal notes:
The 10Eros model is better for NSFW content, whereas the standard model is better for SFW generations, although the body movement of the 10Eros model can be beneficial in some cases for SFW content too, but in general, use each model as I just said.
Try to always use 2 phase sampling generations (Half res + 2x upscaler), this yields the best quality and character consistency, LTX is not good at all at preserving character ID, so don't make it worse by doing a single pass generation. The upscaler model adds extra detail and improves character consistency, that's why I recommend using it.
Don't use the detailer when generating "Amateur look" videos, it adds a light layer of detail to the final result, and most of the time it will look too "polished" for a real amateur recording; amateur style videos look more real when they look low quality.
Main features
GGUF support
Prompt relay for segmented prompts
NSFW prompt enhancer
Text, image, audio, and ControlNet-driven video generation
LoRA support (character, style, and voice via ID LoRA)
Custom or AI-generated audio with automatic syncing
Reference image + up to 7 keyframes (FFLF animation control)
ControlNet video guidance with hybrid reference support
Half-res sampling + 2× upscaling for faster high-quality results
LTX detailer for enhanced final output
Common Setups
Text to video:
All bypassers disabled + Prompt + Default audioImage to video:
Prompt + Reference image + Default audioLipsync:
Prompt + Reference image + Custom audioAudio to video:
Prompt + Custom audio onlyCharacter LoRA + voice cloning:
Prompt + Character LoRA + ID LoRA + Default audioVoice reference to video:
Prompt + ID LoRA + Default audio
OR
Prompt + ID LoRA + Reference image + Default audioCharacter animation:
Prompt + ControlNet + Reference image + (Custom or Default audio)First frame → last frame:
Prompt + Keyframe 1 + Keyframe 2 + (Custom or Default audio)First → middle → last frame:
Prompt + Keyframe 1 + Keyframe 2 + Keyframe 3 + (Custom or Default audio)Character animation with custom voice:
Prompt + Reference image + ID LoRA + ControlNet + Default audio
Detailed instructions are contained in the workflow itself:
Red nodes are instructions and useful notes.
Yellow nodes are configurable elements you can adjust to your needs.

Description
Added 10Eros model support.
Added NSFW prompt enhancer support.
Added a model mode option for the upscaler sampler. Now the users can choose from upscaling using the base model or guide the upscaler including the loaded LoRAs.
FAQ
Comments (6)
really nice WF. im enjoying testing it out and it works well. when doing i2v sometimes it looses face consistency pretty quickly; wondering if there are any specific nodes that i might decrease strength to decrease changes to the original face?
As I explain in the post, this is a common issue of the LTX model, to attack that inconsistency I recommend using a reference image strength of 0.9 and in conjunction with half res + 2x upscaler, and if possible, use keyframes to lock the face identity; what works for me is, if the scene in the video won't change too much, I use the same image as first and last frames, this forces the model to keep the face consistent, although is not perfect, it improves character consistency, and if possible, using different keyframes depending on your video will also increase consistency since you force the model to render specific images in the middle of the video (the keyframes). To create more keyframes from a base image, you can use the next scene qwen edit LoRA or your prefered online AI platform.
i could really use some help. i have spent over 100 hours trying to learn this stuff. im using comfyui.
This workflow cooks! I have started to use it exclusively for my youtube videos. It just does it all. Thank you so much for your contribution!
Glad to hear it ;)
"I thought the missing links between nodes in the downloaded workflow file were some kind of gate the author set on purpose, because their reply said that using this workflow requires some troubleshooting skills. I spent forever connecting the nodes by following the example image — gotta say, it did help me understand the workflow better, but some problems still leave me clueless. For example, when doing i2v, I get things like
'TypeError: CFGGuider.execute() missing 1 required positional argument: 'model'' and 'graph.DependencyCycleError: Dependency cycle detected'.I feel like these issues are caused by incorrect node connections. And there are just too many models using SET and GET nodes — MODEL_BASE, MODEL_EXTENDED, EROS_SAMPLING_MODEL, MODEL_LORA, MODEL_FULL, etc. All of this confuses me a lot. It'd be awesome if you could make a tutorial explaining the different connection methods and how to distinguish between things like Standard LTX 2.3 distilled, LTX 2.3 distilled GGUF, and 10Eros, so we can avoid mixing them up."
我以为下载到的工作流文件里节点之间缺少链接是作者设置的门槛呢,因为作者的回复说用使用这个工作流需要有一定排查问题的能力,我用了很久很久才照着示例图片把节点连接好,不得不说这也帮我更了解此工作流,但是有些问题依旧没头绪,比如i2v的时候会显示
TypeError: CFGGuider.execute() missing 1 required positional argument: 'model'
graph.DependencyCycleError: Dependency cycle detected
这些问题我觉得都是因为节点连接有误造成的,而使用SET、GET节点的模型太多了,MODEL_BASE、MODEL_EXTENDED、EROS_SAMPLING_MODEL、MODEL_LORA、MODEL_FULL等等,这些都让我困惑,如果可以出一期讲解就好了,内容就是把Standard LTX 2.3 distilled、LTX 2.3 distilled GGUF、10Eros不同的链接方式与判断理清避免混淆
