Unleash the power of pure text-to-video generation! This "MotionForge" ComfyUI workflow is your all-in-one solution for creating dynamic, high-quality videos directly from your imagination. No starting image is needed—just a powerful prompt.
This streamlined pipeline leverages the best of the Wan2.2 ecosystem:
Text-to-Video Generation: Harnesses the massive Wan2.2-T2V-A14B models for robust initial video creation from your text descriptions.
Lightning-Fast Motion: Integrates the revolutionary LightX2V 4-Step LoRAs, drastically reducing the number of steps needed for smooth, coherent motion.
Style Fusion: Optionally applies a FLUX style LoRA to add unique aesthetic flair to your generations.
HD Latent Upscaling: Refines and enlarges the video using the efficient Wan2.2-Fun-5B-InP model, enhanced by the FastWan LoRA for quick, high-quality results.
Cinematic Finish: Delivers a final, buttery-smooth 32FPS output, upscaled and ready for display.
Go from a simple idea to a stunning animated video in one seamless process.
✨ Features & Highlights
True Text-to-Video: Generate videos from text prompts alone—no input image required. Perfect for bringing entirely new concepts to life.
Ultra-Efficient 4-Step Generation: The included LightX2V LoRAs are a game-changer, producing high-quality motion in a fraction of the usual steps.
Style Customization: Built-in integration for a FLUX style LoRA, allowing you to easily tweak the artistic output of your videos.
Two-Pass Quality Pipeline: Uses both a High-Noise and Low-Noise model path for optimal detail and motion clarity.
HD Upscaling & Refinement: The dedicated 5B upscaler node cleans up and enlarges your video for a professional finish.
Optimized Performance: Includes
cleanGpuUsed
nodes to help manage VRAM throughout the complex generation process.
📦 Required Models (Please Download First!)
For this workflow to function, you must download and place the following models in your respective ComfyUI models
folders.
1. Core Wan2.2 T2V GGUF Models:
Wan2.2-T2V-A14B-HighNoise-Q8_0.gguf
Wan2.2-T2V-A14B-LowNoise-Q8_0.gguf
Wan2.2-Fun-5B-InP-Q8_0.gguf
(for upscaling)Source: https://huggingface.co/QuantStack (check for T2V-specific GGUF files)
2. Motion & Style LoRAs (for A14B T2V):
lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors
(The key to 4-step generation!)Wan2.2-Lightning_T2V-v1.1-A14B-4steps-lora_LOW_fp16.safetensors
Source: (Typically found alongside other Wan2.2 LoRAs on Hugging Face)
aidmaMJ6.1-FLUX-v0.5.safetensors
(Optional style LoRA)Source: (Search Civitai or Hugging Face for FLUX LoRAs)
3. Upscaler LoRA (for 5B):
Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors
(Drastically reduces required steps for upscaling!)
4. VAE & Upscaler:
Wan2_1_VAE_fp32.safetensors
(for initial generation)Wan2.2_VAE.safetensors
(for the upscaler sub-graph)RealESRGAN_x2plus.pth
(Standard upscaling model)Source: https://huggingface.co/dtarnow/UPscaler/tree/main (or any standard model repository)
5. CLIP Encoder:
umt5-xxl-encoder-Q8_0.gguf
(Typically bundled with the Wan GGUF downloads)
⚙️ Installation & Usage
Download the Workflow: Download the provided
.json
file from this Civitai page.Download All Models: Ensure you have all the models listed above downloaded to the correct folders.
Load in ComfyUI: Open ComfyUI, drag the
.json
file into the window, and the workflow will load.Check Loaders: The workflow uses ComfyUI-GGUF and ComfyUI-VideoHelperSuite (VHS). Please ensure you have these custom nodes installed.
Craft Your Prompt:
This is a Text-to-Video workflow. Leave the
start_image
input on theWanImageToVideo
node disconnected.Modify the positive and negative prompts in the CLIP Text Encode nodes. The provided example creates a fun "cat surfing selfie" video.
Set Your Video Size: Adjust the
width
andheight
in theWanImageToVideo
node (default is 400x544).Queue Prompt! You're ready to go. The workflow will handle the rest, from T2V generation to upscaling and interpolation.
Pro Tip: The workflow uses a two-stage KSampler. The first stage (4 steps) creates the motion, and the second stage (4 steps) refines it. You can adjust the cfg
and steps in these samplers to fine-tune your results.
Conclusion
The "MotionForge" workflow demystifies high-quality text-to-video generation. By combining the latest specialized models and LoRAs, it offers a powerful yet surprisingly efficient path from a text prompt to a polished video. It's perfect for creators who want to explore the limitless possibilities of AI-driven animation without any initial imagery.
We can't wait to see what you create! Share your results, like, and follow for more powerful workflows.