LTX-2.3 Dev Audio+Image To Video (GGUF)

LTX-2.3 Dev Audio+Image To Video (GGUF) - Audio+Image to video

base workflow for Audio+Image to video for Dev model. LOW VRAM as possible.

can also generate text to video with audio reference. (switch red boolean node to TRUE)

i suggest leaving the prompt alone unless you want to prompt for a specific motion or action to occur.

prompt:

" Transform this static image into a high-quality video with with realistic facial expressions and realistic motion.

Perfect lip-sync to the attached audio. "

FILES:

OPTIONAL Kijais fp8 Scaled (requires load diffusion model node instead of unet loader node and replaces the gguf entirely. )

https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/diffusion_models

DEV gguf (distilled ggufs are in the repo as well)

https://huggingface.co/unsloth/LTX-2.3-GGUF/tree/main

Gemma 3_12B FP4 text encoder

https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors

Audio VAE

https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_audio_vae_bf16.safetensors

Video VAE

https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_video_vae_bf16.safetensors

Text Projection text encoder

https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders

Distill Lora

https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-lora-384.safetensors

Upscaler

https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-spatial-upscaler-x2-1.1.safetensors

Description

A+I2V

FAQ

Comments (6)

ArtificialOtakuMar 22, 2026

CivitAI

Very cool wf, had to modify it so it would take my tensor file and added more lora nodes, but other than that, quite simple and clear to work with, thanks!

gambikules858Mar 22, 2026

CivitAI

and if first + last frame ?

creatorjulie743Mar 25, 2026

CivitAI

I don't know. It's not working right. See the posted video. It should have the workflow in it. The only difference is that I used Q8_0 gguf and gemma_3_12b_it text encoder. Oh, and I used resolution 720x1024. Everything else is the same as in the sample workflow.
Funny part is that I tried the sample image (the guy in a baseball hat) and sound clip and it worked. Was using the same Q8_0 gguf and gemma_3_12b_it text encoder and changed the resolution to 768x768. But my own audio and images do not work even when using the same lowered resolution. What gives?

chrisbraeuer41172035Mar 26, 2026

IDK i am also having huge problems. even with the official workflow. it eiter throws errors, or it speaks alien language, or everythin looks bloomed, or subtitles everywhere, or general bad movements, and i2v is a joke.

creatorjulie743Mar 26, 2026

@chrisbraeuer41172035 Well, with straight i2v, I managed to get some decent clips with various workflows, including the default ComfyUI one. Lots of duds, but some clips are pretty decent. But, I've tried several ai2v workflows and none of them works halfway decent.

chrisbraeuer41172035Mar 29, 2026

@creatorjulie743 I am really not sure. I also got some decent clips. Woman in protrait mode speaking works great. speaking portraits in general. But as sonn as i try to do something different if falls off a cliff. Its drinving me nuts. Just trying to let someone go up some stairs. Not possible at all.

Workflows

LTXV 2.3

by realrebelai

Download (Beta) View on CivitAI