DJZ Highway Racing Tokyo - CivArchive (CivitAI Archive)

DJZ Highway Racing Tokyo - v2

V1: Video Lora is trained 48 stills dumped from a Car Chase scene as part of a 90's Japanese Action Movie. (image based Hunyuan Video Trainer was used)

Prompt:

"xjxcar style video of custom cars, racing on tokyo highways at night, JDM style found footage"

V2 was trained on Video clips, not images, to show the difference (same dataset)
Prompt:
"xjx style video of custom cars, racing on tokyo highways at night, JDM style found footage"

Description

trained video dataset version with replicate

FAQ

Comments (6)

Bibab0bFeb 2, 2025· 2 reactions

CivitAI

It is trained on one of the "Shuto Kôsoku toraiaru" aka "Shuto Kousoku Trial" aka "Megalopolis Expressway Trial" film series scenes?

driftjohnson

Author

Feb 2, 2025

I'm unsure which movie exactly, but this is from the same production team, cinematography and locations. The chase in my dataset featured a white maxda rx7 (FC) and a yellow skyline R32.

2790319Feb 3, 2025· 3 reactions

CivitAI

Really cool. I assume a drift LoRA is in the works?

driftjohnson

Author

Feb 4, 2025

yes.

I'm working on a few Drift themed Lora, with success on our side, there will be more published very soon

LDWorksDavidFeb 3, 2025· 2 reactions

CivitAI

Good job here. v2 IMO looks more solid (maybe was obvious but it's interesting to see a comparison between two). Curious about the captions on your dataset, do you only use the same caption for all scenes? or different for each? Now you need v3 = v2 + v1 images hehe. Cool stuff man!

driftjohnson

Author

Feb 4, 2025

for V2 I used "XJX video of custom cars [ qwen VL caption ] JDM found footage in the style of XJX"
as a rough blueprint for this one. it seems to be almost exactly right with the caption example in description.
for v1 I had to use a simple catch all prompt which was applied to all the images.

I think that image training to hyvideo is more accurate to the ground truth, but video clip based training gets all those small motions correct.

Importantly i train with 125 frame segment clips and i drop boring or confusing segments from the dataset before training. Basically split the video into segments of 125 frames, then use Uniform frame sampling with 8 frames on that parameter.

I might retrain with fully manual captions, but this will take time to get done.
If you liked this one i have a new one with more varied video data set, training on H100 (for 13hrs)
if we are lucky it will be ready tomorrow

thanks for asking :)

LORA

Hunyuan Video

by driftjohnson

Download (Beta) View on CivitAI