The LTX 2.3 KJ Stripped 1.1 workflow is a specialized audio-visual (AV) generation pipeline designed to interpolate between two static images. It utilizes a "Guided" architecture to ensure the generated video adheres strictly to a designated start and end point.
Core Architecture & Processing
Model Loading: The workflow employs the LTX-Video 2.3 22B Distilled transformer model via a specialized KJ loader node.
Dual-Modality Processing: It initializes both an
EmptyLTXVLatentVideoand anLTXVEmptyLatentAudiospace.AV Concatenation: Visual and audio latents are merged into a single latent stream using the
LTXVConcatAVLatentnode, allowing the sampler to process both simultaneously.
Frame Guidance System
Start Frame Anchor: An image is loaded and preprocessed to serve as the reference for
frame_idx: 0.End Frame Anchor: A second image is loaded and preprocessed to serve as the reference for the final frame (
frame_idx: -1).Guide Application: The
LTXVAddGuidenodes apply these images to the latent space with a configurable strength (set to 0.7 in this version) to dictate the video’s trajectory.
Sampling & Decoding
KSampler Configuration: The workflow uses a high-speed 8-step Euler sampler with a simple scheduler.
Separation & Decoding: After sampling, the
LTXVSeparateAVLatentnode splits the results back into distinct video and audio streams.Tiled Decoding: Visuals are decoded using
VAEDecodeTiledto manage high-resolution output (1024x1024) efficiently.
Enhancement & Output
NVIDIA RTX Integration: The workflow includes an
RTXVideoSuperResolutionnode that provides a 2x hardware-accelerated upscale on the final image sequence.Final Assembly: The
VHS_VideoCombinenodes produce two versions of the video—a base generation and a super-resolved version—both featuring synchronized, AI-generated audio.