Image Generation Case Study

A Comprehensive Study of Open-Source Diffusion Models and API Services

About This Project

This repository provides a comprehensive case study of existing open-source diffusion models' capabilities in text-to-image generation and image editing. It offers a unified interface for comparing and experimenting with 14 state-of-the-art text-to-image models, along with support for closed-source API services.

Whether you're a researcher exploring the latest in generative AI, a developer integrating image generation into your applications, or an enthusiast experimenting with different models, this project provides an easy-to-use platform for text-to-image generation with comprehensive model support and flexible deployment options.

Key Features

  • 🎨 Multi-model Comparison: Generate images with multiple models simultaneously for side-by-side comparison
  • 🔓 14 Open-Source Models: Including Stable Diffusion variants, FLUX.1, SDXL, CogView, PixArt, and more
  • 🔒 Closed-Source API Integration: Support for OpenAI DALL-E, Google Imagen, Bytedance Cloud, and Kling AI
  • 🖥️ Gradio Web UI: User-friendly interface for interactive image generation
  • ⚙️ Configurable Parameters: Full control over inference steps, guidance scale, image size, and seed
  • 💾 Auto-save Organization: Images automatically saved with timestamp folders and generation config JSON
  • 🚀 Multi-GPU Support: Automatic device mapping for utilizing multiple GPUs efficiently
  • 📊 Memory Efficient: Sequential generation to manage VRAM usage

Supported Models

Open-Source Text-to-Image Models (14 Total)

From lightweight 3GB models to state-of-the-art 16GB models:

⚡ Fast & Efficient

  • Stable Diffusion 2.1 (~4 GB) - Classic, reliable
  • PixArt-XL 2 (~4 GB) - Fast generation
  • Sana 600M (~3 GB) - Lightweight

🎯 High Quality

  • Stable Diffusion XL (~7 GB) - Higher quality
  • FLUX.1 Dev (~16 GB) - State-of-the-art
  • Stable Diffusion 3 (~9 GB) - Latest SD3

🌏 Multilingual

  • CogView3 Plus 3B (~6 GB) - Multilingual
  • CogView4 6B (~11 GB) - Latest CogView
  • HunyuanDiT v1.2 (~10 GB) - Chinese + English

🔬 Specialized

  • Stable Cascade (~10 GB) - Multi-stage
  • Qwen Image (~8 GB) - Multimodal
  • UniDiffuser v1 (~5 GB) - Unified model

Closed-Source API Services

  • OpenAI DALL-E: DALL-E 2 & DALL-E 3 with quality and style controls (up to 1792x1792)
  • Google Imagen: Vertex AI Imagen for photorealistic generation (up to 1536x1536)
  • Bytedance Cloud: Volcano Engine text-to-image API (up to 2048x2048)
  • Kling AI: High-quality generation models (up to 2048x2048)

Quick Start

Installation

# Clone the repository
git clone https://github.com/Bili-Sakura/image-generation-case-study.git
cd image-generation-case-study

# Install dependencies
pip install -r requirements.txt

# Optional: Install API dependencies for closed-source models
pip install -r requirements_api.txt

Usage

Option 1: Gradio Web UI (Recommended)

python run.py

This will open a web browser at http://localhost:7860 with an intuitive UI for text-to-image generation.

Option 2: Python API

from src.model_manager import get_model_manager
from src.inference import generate_image

# Load model
manager = get_model_manager()
manager.load_model("stabilityai/stable-diffusion-2-1")

# Generate
image, filepath, seed = generate_image(
    model_id="stabilityai/stable-diffusion-2-1",
    prompt="A fantasy landscape with mountains and rivers",
    num_inference_steps=50,
    guidance_scale=7.5,
    seed=42
)

Generation Parameters

  • Inference Steps: 10-100 (default: 50) - More steps = higher quality but slower
  • Guidance Scale: 1.0-20.0 (default: 7.5) - Higher values = stronger prompt adherence
  • Image Sizes: 512px to 1280px with multiple presets
  • Seed Control: Fixed seed for reproducibility or random (-1)
  • Negative Prompts: Supported on compatible models

Citation

If you find this repository useful, please cite it as:

@misc{bili_sakura_image_generation_case_study,
  author       = {Bili-Sakura},
  title        = {Image Generation Case Study},
  year         = {2025},
  howpublished = {\url{https://github.com/Bili-Sakura/image-generation-case-study}}
}