Stable Diffusion

Stable Diffusion is Stability AI's open-source text-to-image model that revolutionized AI image generation. Available in multiple variants from SDXL to SD 3.5, it offers local deployment, fine-tuning, and a vast community ecosystem of LoRAs and plugins.

0.0 (0 đánh giá)

Danh mục

Image & Photography Development & IT

Tổng quan

Stable Diffusion is Stability AI's family of open-weight text-to-image models that transformed the generative AI landscape. Unlike proprietary alternatives such as Midjourney and DALL-E, Stable Diffusion's weights are publicly available, enabling local deployment on consumer hardware, community fine-tuning, and unrestricted customization.

First released in August 2022 by Stability AI in collaboration with Runway and CompVis, Stable Diffusion operates as a latent diffusion model (LDM). Instead of processing pixels directly, it compresses images into a compressed latent space through a variational autoencoder (VAE), making generation significantly more efficient than pixel-space diffusion models.

Architecture Evolution

SDXL introduced in July 2023 a 3.5 billion parameter UNet backbone with a dual text encoder pipeline (OpenCLIP ViT-bigG and CLIP ViT-L). It features a two-stage process: a base model generates initial latents and a separate refiner model adds high-quality details through SDEdit. SDXL natively supports 1024x1024 resolution and multiple aspect ratios via size and crop conditioning.

SD 3.0 released in February 2024 marked a fundamental architectural shift from the traditional UNet to a Multimodal Diffusion Transformer (MMDiT) using rectified flow. It employs three text encoders in parallel (CLIP-G/14, CLIP-L/14, T5-XXL) totaling 5 billion parameters dedicated to text understanding.

SD 3.5 released in October 2024 addressed SD 3.0's shortcomings with refined architecture and training. It excels in prompt adherence, image quality, typography, and diverse output styles. The MMDiT architecture with QK-normalization improved training stability and generation consistency.

Model Variants

SD 3.5 Large at 8.1 billion parameters is the flagship model offering superior quality and prompt adherence at 1 megapixel resolution. It requires approximately 18GB VRAM for inference on RTX 4090-class hardware. SD 3.5 Large Turbo is a distilled version generating high-quality images in just 4 sampling steps. SD 3.5 Medium at 2.5 billion parameters is designed for consumer hardware requiring only 9.9GB VRAM and supports resolutions from 0.25 to 2 megapixels.

Key Capabilities

Text-to-image generation creates images from natural language prompts handling complex multi-subject scenes and spatial relationships. Image-to-image transforms existing images based on prompt guidance and denoising strength. Inpainting and outpainting provide context-aware filling or extension of images beyond original boundaries. ControlNet integration offers precise composition control using canny edges, depth maps, and pose estimation.

Use Cases

Stable Diffusion serves creative design and art for concept art and illustration, marketing and advertising for product visuals and campaign creatives, game development for concept art and texture creation, e-commerce for product photography and lifestyle imagery, and research and education for generative model studies.

Licensing and Ecosystem

SD 3.5 is released under the Stability AI Community License. It is free for non-commercial use, scientific research, and commercial use by organizations with under $1 million in total annual revenue. Generated output images belong to the user and can be used commercially under all license tiers. The open-source ecosystem includes popular interfaces like ComfyUI, Automatic1111 Web UI, and Fooocus. The community has produced thousands of LoRAs, textual inversions, and fine-tuned checkpoints on platforms like Civitai and Hugging Face.

Tổng quan công cụ

Bảng giá

Security & Privacy Development & IT

PaidFree Trial

0.0(0)

Truy cập