WAN2.2 5B Ultimate Suite - T2V, I2V & T2I2V Pro

The most advanced and comprehensive WAN2.2 5B workflow on CivitAI. This all-in-one suite masterfully combines Text-to-Video, Image-to-Video, and Text-to-Image-to-Video generation, powered by local LLMs (Ollama) for intelligent, dynamic prompt enhancement. Stop using basic prompts; generate cinematic, fluid animations with intelligent motion design.

Workflow Description

Unleash the full potential of the WAN2.2 5B model with this meticulously designed, feature-packed ComfyUI workflow. This isn't just a simple pipeline; it's a professional content creation suite that intelligently bridges the gap between your ideas and stunning AI-generated video.

Why This Workflow Stands Out:

* 🤖 AI-Powered Intelligence: Integrated Ollama LLMs analyze your text or images to generate richly detailed, dynamic prompts specifically engineered for WAN2.2's video capabilities. It translates static concepts into descriptions full of motion, lighting, and cinematic life.

* 🎬 Multi-Modal Mastery: Seamlessly switch between three powerful generation modes without changing workflows.

* ⚙️ Optimized & Robust: Built with stability and efficiency in mind. Includes automatic GPU memory management, frame interpolation, and a professional video output system.

* 🔄 All-in-One Pipeline: From a simple idea or image to a final, smooth video file, everything is connected and automated.

Features & Technical Details

🧩 Core Components:

* Model: wan2.2_ti2v_5B_fp16.safetensors

* VAE: wan2.2_vae.safetensors

* Key Loras: Wan2.2_5B_FastWanFullAttn (style LoRA)

* Upscaler: Integrated for pre-processing input images.

* Frame Interpolation: RIFE VFI for buttery-smooth 2x frame generation (outputs 24fps and 48fps videos).

🔧 Integrated AI Engines (Ollama):

* For Text (T2V): huihui_ai/gemma3-abliterated:12b-q8_0 - Analyzes your simple text and generates detailed video prompts with motion, camera work, and atmosphere.

* For Vision (I2V): qwen2.5vl:7b-q8_0 - Analyzes any image you provide and writes the perfect animation prompt based on its content.

* For T2I (Flux Group): gemma3:latest - Enhances simple text descriptions for high-quality image generation which can then be animated.

📊 Output:

* Resolution: Adapts to your input image size or defined latent size.

* Frames: Configurable length (default: 121 frames).

* Format: MP4 (H.264) with proper metadata.

* Dual Output: Standard 24fps and interpolated 48fps videos are saved automatically.

How to Use / Steps to Run

Prerequisites:

1. ComfyUI Manager: Essential for installing missing custom nodes.

2. Ollama: Installed and running on your system. You must pull the required LLM models gemma3, qwen2.5vl).

3. All Models/LoRAs: Ensure all paths in the workflow point to files you actually have. The most common error is a missing model!

4. Custom Nodes: The workflow will prompt you to install any missing nodes via ComfyUI Manager. Key node suites include:

* comfyui-ollama

* comfyui-videohelpersuite

* comfyui-frame-interpolation

* comfyui-easy-use

* gguf (for Flux loading)

Usage Instructions:

1. TEXT-to-VIDEO (T2V)

1. Locate the green "Enter simple prompt here" node.

2. Replace the text with your simple idea (e.g., "a knight drawing his sword in a rainy forest").

3. Ensure the OllamaConnectivityV2 node points to your Ollama server (default: http://192.168.0.210:11434).

4. Queue Prompt. Watch the Ollama node generate a detailed cinematic prompt, which is then used to create the video.

2. IMAGE-to-VIDEO (I2V)

1. In the "Load Image" node, upload your starting image.

2. The image will be automatically analyzed by the Qwen vision model.

3. The Ollama node will generate a motion prompt tailored to the image's content.

4. Queue Prompt. The workflow will animate your image based on the AI-generated description.

3. TEXT-to-IMAGE-to-VIDEO (T2I2V)

1. Use the Flux/Krea group (on the left side of the workflow).

2. In the PrimitiveStringMultiline node, enter a description for the image you want to generate (e.g., "a gorilla in the jungle eating a banana").

3. Run the prompt. This group will generate a high-quality image.

4. Once the image is generated, you can manually connect it to the main I2V pipeline or use the provided "Auto-last frame extract" group to automatically find the latest generated image and animate it.

⏯️ Output: Your finished videos will be saved to your ComfyUI output/video/ folder. The workflow also saves a preview of the first frame.

Tips & Tricks

* Ollama Server: The workflow is pre-configured for IP 192.168.0.210. You MUST change this in all three OllamaConnectivityV2 nodes to http://localhost:11434 or your server's IP.

* Speed vs. Quality: Adjust the steps in the KSampler (default 8). Lower is faster, higher may yield better quality.

* Control: You can bypass the Ollama nodes entirely. Just plug your own expertly crafted positive prompt directly into the "CLIP Text Encode (Positive Prompt)" node.

* Troubleshooting: If you get errors, check the ComfyUI console. Most issues are due to incorrect Ollama server addresses or missing model files.

This workflow represents the cutting edge of accessible AI video generation. It demonstrates how leveraging multiple AI systems together (diffusion models + LLMs) creates results far beyond what any single model can achieve alone.

Enjoy creating, and please share your amazing results!

Description

Details

Files

wan225BUltimateSuiteT2V_v10.zip

Mirrors