MOCHI VIDEO GENERATOR
(results are in the v1, v2, etc gallery, click the tabs at the top)
True i2v workflow added from V8 onwards, details in the main Article
video TBA
Showcase Special: (created with mostly one ACE-HOLO promptgen line)
pack update V7 + special Video promptgen guide with ACE-HoloFS.
V7 Demo Reel (made with Shuffle Video Studio)
Roundup of the research so far, with some more detailed instructions/info
Current leader: (V7 gallery) (V8 adds image encoding)
"\V7-Spatial-Tiling-VAE\Donut-Mochi-848x480-t2v-BatchedLatentSideload-v55"
i2v version used LLM Video prompt gen, t2v used my Zenkai-prompt + DJZ-LoadLatent.
WIP project by Kijai
Info/Setup/Install guide: https://civarchive.com/articles/8313
Requires Torch 2.5.0 minimum, so update your Torch if you are behind.
As with the CogVideo Workflows, they are provided for people that want to try the Preview :)
Even with a 4090 it can push the limits a little, I provide my workflows used to research Tile Optimisation in V1;
We're reducing tile sizes by roughly 20-40% from the defaults
We're increasing the frame batch size to compensate
Maintaining the same overlap factors to prevent visible seams
Key principles:
Tile sizes should ideally be multiples of 32 for most efficient processing
Keep width:height ratio similar to the original tile sizes
Frame batch size increases should be modest to avoid frame skipping
Researchers Tip!
If you work with a fixed seed, the sampler remains in memory, so the first gen took ~1700 seconds, however, changes to the Decoder can be made which means that the next video will take ~23 seconds. All the work is already done by the Sampler, so unless we take a new seed it will use the samples over and over, VAE decode speed is very good!
^ subsequent gens on same seed are very fast, allowing tuning of the decoder settings ^
^ initial generation was taking ~1700 with pytorch 2.5.0 SDP ^
V1 Workflows:
outputs labelled and added to V1 gallery, test prompt used:
"In a bustling spaceport, a diverse crowd of humans and aliens board a massive interstellar cruise ship. Robotic porters effortlessly handle exotic luggage, while holographic signs display departure times in multiple languages. A family of translucent, floating beings drift through the security checkpoint, their tendrils wrapping around their travel documents. In the sky above, smaller ships zip between towering structures, their ion trails creating an ever-changing tapestry of light."
\Decoder-Research\Donut-Mochi-848x480-batch10-default-v5
= Author Default Settings
This version used the recommended config from Author
\Decoder-Research\Donut-Mochi-640x480-batch10-autotile-v5
= Reduzed size, Auto Tiling
- This is my first run which created the video in the gallery, simply using Auto Tile on the decoder and reducing the overall dimensions to 640x480. This reduction makes generation take less memory, but is heavy handed and will reduce the quality of outputs.
The remaining workflows are all Investigating the possible configs, without using Auto Tiling so we know what was used exactly. Videos will be labelled for the batch count and added to v1 gallery. Community research is required !
\Decoder-Research\Donut-Mochi-848x480-batch12-v5
frame_batch_size = 12
tile_sample_min_width = 256
tile_sample_min_height = 128
\Decoder-Research\Donut-Mochi-848x480-batch14-v5
frame_batch_size = 14
tile_sample_min_width = 224
tile_sample_min_height = 112
\Decoder-Research\Donut-Mochi-848x480-batch16-v5
frame_batch_size = 16
tile_sample_min_width = 192
tile_sample_min_height = 96
\Decoder-Research\Donut-Mochi-848x480-batch20-v5
frame_batch_size = 20
tile_sample_min_width = 160
tile_sample_min_height = 96
\Decoder-Research\Donut-Mochi-848x480-batch24-v5
frame_batch_size = 24
tile_sample_min_width = 128
tile_sample_min_height = 64
\Decoder-Research\Donut-Mochi-848x480-batch32-v5
frame_batch_size = 32
tile_sample_min_width = 96
tile_sample_min_height = 48
The last workflow is a Hybrid Approach, the increased overlap factors (0.3 instead of 0.25) might help reduce visible seams when using very small tiles.
\Decoder-Research\Donut-Mochi-848x480-batch16-v6
frame_batch_size = 16
tile_sample_min_width = 144
tile_sample_min_height = 80
tile_overlap_factor_height = 0.3
tile_overlap_factor_width = 0.3
V2 Workflow
\CFG-Research\Donut-Mochi-848x480-batch16-CFG7-v7
This used the Donut-Mochi-848x480-batch16-v6 workflow with 7.0 CFG
this seems to be a good setting, generation time is 24 minutes with this setup.
(pytorch SDP used)
V3 Workflows
\FP8--T5-Scaled\Donut-Mochi-848x480-batch16-CFG7-T5scaled-v8
We decided to use the FP8_Scaled T5 CLIP model, this improved the outputs greatly across all prompts tested. check the v3 gallery. This is the best so far ! (until we beat it)
\GGUF-Q8_0--T5-Scaled\Donut-Mochi-848x480-b16-CFG7-T5scaled-Q8_0-v9
This did not yield the best results, probably due to T5 scaled Clip still being in FP8 as we were testing the use of GGUF Q8_0 as the main model.
V4 Workflow
\T5-FP16-CPU\Donut-Mochi-848x480-b16-CFG7-CPU_T5-FP16-v11
used T5XXL in FP16 by forcing it onto the CPU. Seems like the same artifacts from V3 where we used GGUF Q8_0 with T5XXL FP8.
V5 Workflows
\GGUF-Q8_0--T5-FP16-CPU\Donut-Mochi-848x480-GGUF-Q8_0-CPU_T5-FP16-v14
This was the best settings with VAE Tiling enabled, increasing the steps of course will increase the quality and the time taken.
Increasing steps to 100-200 is increasing quality at the expense of time taken, 200 steps takes 45 minutes. Likely no version for this because anybody can add more steps to any of these workflows and just wait a very long time for a 6 second video. This can be remedied with a Cloud setup and more/larger GPU/VRAM allocation.
V6 Workflows
\Fast-25-Frames\Donut-Mochi-848x480-Fast-v4
Used VAE Tiling with 25 frames to generate 1 second of video. with 50 steps this takes a few minutes, 4-5 minutes for 100 steps.
\NoTiling-SaveLoadLatent\Donut-Mochi-848x480-i2v-LatentSideload-v21
Using my new DJZ-LoadLatent Node, you can save the sampler results as .latent files on the disk, this makes it possible to decode the latents as a separate stage, eliminating the need for the Tiling VAE. This is image to video, and used OneVision to estimate a video prompt from any given image, it also automatically detect Tall or Wide Aspect ratio and crops/fills to 16:9 or 9:16. NOTE: more testing must be done to prove that Tall Aspect Quality is good.
\NoTiling-SaveLoadLatent\Donut-Mochi-848x480-t2v-LatentSideload-v25
This is the text to image version of the previous workflow, we drop OneVision and ImageSizeAdjusterV3 and add Zenkai-Prompt-V2 back in to take advantage of our prompt lists. Full instructions are found in the workflow notes.
Save/Load Latent approach allows us to drop the Tiling VAE, which introduced ghosting to all videos regardless of the settings, as we achieved improved quality the ghosting becomes more apparent.
V7 Workflows
Updated the V6 latent sideload workflows to use the newer VAE Spatial Tiling Decoder
This can run 100% on local GPU, and all the demo videos in the gallery used on 50 steps
(100 steps used in the V6 gallery) another significant upgrade !
\V7-Spatial-Tiling-VAE\Donut-Mochi-848x480-t2v-LatentSideload-v50.json
text2video, VAE spatial tiling decoder, with my latent loader
\V7-Spatial-Tiling-VAE\Donut-Mochi-848x480-i2v-LatentSideload-v50.json
pseudo image2video, VAE spatial tiling decoder, with my latent loader
\V7-Spatial-Tiling-VAE\Donut-Mochi-848x480-t2v-BatchLatentSideload-v55.json
text2video, VAE spatial tiling decoder, with my V2 batched latent loader
\V7-Spatial-Tiling-VAE\Donut-Mochi-848x480-i2v-BatchLatentSideload-v55.json
pseudo image2video, VAE spatial tiling decoder, with my V2 batched latent loader
NOTE: V7 is available on Github in my DJZ-Workflows pack, however it will not get published here until the new batch of videos are finished (cooking all night tonight)
V8 Workflows
\True-Image-To-Video\Donut-Mochi-848x480-i2v-LatentSideload-v90.json
image2video, VAE spatial tiling decoder, with my latent loader
\True-Image-To-Video\Donut-Mochi-848x480-i2v-BatchedLatentSideload-v90.json
image2video, VAE spatial tiling decoder, with my V2 batched latent loader
Added true i2v (image to video using new VAE Encoder)
tutorial video TBA. details in the main article
Description
FAQ
Comments (13)
I always get
object of type 'float' has no len()Same error here. It appears to be attempting to calculate the 'length' of a data type that does not have one. I was able to get Kijai's adaptation of CogVideoX working, but not this.
len(cfg_schedule) == sample_steps
^^^^^^^^^^^^^^^^^
TypeError: object of type 'float' has no len()
I'd overlooked the video here, right above the discussion area. Found this in the comments:
"sounds like you need to update the pytorch to at least 2.5.0, use the .bat in the update folder (comfyui) update with dependencies."
I'd already been doing this, but it was deleting my symlink to 'models'. I was deleting the new folder, and re-creating the symlink, as I didn't think any of the updated data would be within this folder. But after seeing the video comment, I re-did the update again, then moved the newly created 'models' folder, overwriting files in my symlinked folder. There is noticeable difference in the GUI now, I have Pytorch 2.5.1+cu124, and things appear to progress further...yet I still get the same error.
@Meowy___Cat It was actually because the nodes updated and the fields in the sampler changed, this can happen with WIP projects, V7 is now released which resolved the issue (adding the node back in again with the same settings)
The nodes changed, due to it being a WIP research project, deleting the sampler and re-adding it with the same value resolved the issue - updated in the V7 release :D
Thanks! Was about to call it for the night, then saw you responded. Had just done a fresh install and update of Comfy portable, this time without using symlinks. Tried a couple of the workflows in the v7 zip, and uninstalled and reinstalled custom nodes, but am still getting the same error. Perhaps I'll try later with a non-portable install. I've been using A1111 for a while, but I'm new to Comfy.
ComfyUI\custom_nodes\ComfyUI-MochiWrapper\mochi_preview\t2v_synth_mochi.py", line 241, in run
len(cfg_schedule) == sample_steps
^^^^^^^^^^^^^^^^^
TypeError: object of type 'float' has no len()
Oh, I re-read your responses, then right clicked on the Mochi sampler node and selected "Fix node (recreate)" - success! :)
@Meowy___Cat glad you got it solved !
Comfy environment is completely broken after installing custom nodes from this workflow. " Torch not compiled with CUDA enabled" Cannot launch Comfy.
/1
1. Portable comfyUI used embedded python and so it does not touch your Torch
2. I have the latest update here, it does not break anything.
3. If you are getting Torch not enabled, you are not running ComfyUI with the portable startup.
These nodes will not create the problem you are experiencing. How did you start the ComfyUI? You should be using the .bat files that are coming from the portable installation. If you don't even use ComfyUI portable, you probably need to activate your VENV. If you use system python, that is up to you to maintain.
It's much much easier to use ComfyUI embedded python, which does not affect your system.
Since using Portable, i never see this error.
Contents of Requirements.txt
- accelerate
- einops
^ no torch change here
/2
If you are referring to any of my own nodes which are inside my own workflows here, they do not affect torch either.
but regardless, Torch is not being affected here. I would say, check your startup for comfyUI is actually launching proper.
/3
If you are familiar with .bat files, this is how i make sure that the "requirements.txt" is being installed to the portable embedded python properly:
https://github.com/MushroomFleet/ComfyUI-MochiWrapper/blob/main/install-portable.bat
I did a pull request to add this to the project, but it's a very simple script. You have it in any custom node and it will make sure that the requirements are installed to the embedded python as i explained. It's up to the author if they wish to include it.
ah ty for warning, i'll do a fresh install for this.