39 lines
2.1 KiB
Markdown
39 lines
2.1 KiB
Markdown
# 10× inference speedup with high-compression autoencoder
|
||
|
||
|
||
The high computational cost of training video generation models arises from the
|
||
large number of tokens and the dominance of attention computation. To further reduce training expenses,
|
||
we explore training video generation models with high-compression autoencoders (Video DC-AEs). As shown in the comparason below, by switching to the Video DC-AE with a much higher downsample ratio (4 x 32 x 32), we can afford to further reduce the patch size to 1 and still achieve __5.2× speedup in training throughput__ and __10x speedup during inference__:
|
||
|
||

|
||
|
||
|
||
Nevertheless, despite the advantanges in drastically lower computation costs, other challenges remain. For instance, larger channels low down convergance. Our generation model adapted with a 128-channel Video DC-AE for 25K iterations achieves a loss level of 0.5, as compared to 0.1 from the initialization model. While the fast video generation model underperforms the original, it still captures spatial-temporal
|
||
relationships. We release this model to the research community for further exploration.
|
||
|
||
Checkout more details in our [report](https://arxiv.org/abs/2503.09642v1).
|
||
|
||
## Model Download
|
||
|
||
Download from 🤗 [Huggingface](https://huggingface.co/hpcai-tech/Open-Sora-v2-Video-DC-AE):
|
||
|
||
```bash
|
||
pip install "huggingface_hub[cli]"
|
||
huggingface-cli download hpcai-tech/Open-Sora-v2-Video-DC-AE --local-dir ./ckpts
|
||
```
|
||
|
||
## Inference
|
||
|
||
To inference on our fast video generation model:
|
||
|
||
```bash
|
||
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/high_compression.py --prompt "The story of a robot's life in a cyberpunk setting."
|
||
```
|
||
|
||
## Training
|
||
Follow this [guide](./train.md#prepare-dataset) to parepare the __DATASET__ for training.
|
||
Then, you may train your own fast generation model with the following command:
|
||
```bash
|
||
torchrun --nproc_per_node 8 scripts/diffusion/train.py configs/diffusion/train/high_compression.py --dataset.data-path datasets/pexels_45k_necessary.csv
|
||
```
|