Stable Diffusion XL (SDXL) was proposed in SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach.
The abstract from the paper is:
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.
use_karras_sigmas=True or lu_lambdas=True to improve image qualityeuler_at_final=True if you’re using a solver with uniform step sizes (DPM++2M or DPM++2M SDE)negative_original_size, negative_crops_coords_top_left, and negative_target_size to negatively condition the model on image resolution and cropping parameters.[!TIP] To learn how to use SDXL for various tasks, how to optimize performance, and other usage examples, take a look at the Stable Diffusion XL guide.
Check out the Stability AI Hub organization for the official base and refiner model checkpoints!
[[autodoc]] StableDiffusionXLPipeline - all - call
[[autodoc]] StableDiffusionXLImg2ImgPipeline - all - call
[[autodoc]] StableDiffusionXLInpaintPipeline - all - call