This is an EDUCATIONAL project that provides utilities for DreamBooth LoRA training for Stable Diffusion 3 (SD3) under 16GB GPU VRAM. This means you can successfully try out this project using a free-tier Colab Notebook instance. 🤗
[!NOTE] SD3 is gated, so you need to make sure you agree to share your contact info to access the model before using it with Diffusers. Once you have access, you need to log in so your system knows you’re authorized. Use the command below to log in:
hf auth login
This will also allow us to push the trained model parameters to the Hugging Face Hub platform.
For setup, inference code, and details on how to run the code, please follow the Colab Notebook provided above.
We make use of several techniques to make this possible:
compute_embeddings.py script. We use an 8bit (as introduced in LLM.int8()) T5 to reduce memory requirements to ~10.5GB.train_dreambooth_sd3_lora_miniature.py script, we make use of:
bitsandbytes library.F.scaled_dot_product_attention().Computing the text embeddings is arguably the most memory-intensive part in the pipeline as SD3 employs three text encoders. If we run them in FP32, it will take about 20GB of VRAM. With FP16, we are down to 12GB.
This project is educational. It exists to showcase the possibility of fine-tuning a big diffusion system on consumer GPUs. But additional components might have to be added to obtain state-of-the-art performance. Below are some commonly known gotchas that users should be aware of:
Hopefully, this project gives you a template to extend it further to suit your needs.