Danh mục
Tổng quan
Stable Diffusion is Stability AI's family of open-weight text-to-image models that transformed the generative AI landscape. Unlike proprietary alternatives such as Midjourney and DALL-E, Stable Diffusion's weights are publicly available, enabling local deployment on consumer hardware, community fine-tuning, and unrestricted customization.
First released in August 2022 by Stability AI in collaboration with Runway and CompVis, Stable Diffusion operates as a latent diffusion model (LDM). Instead of processing pixels directly, it compresses images into a compressed latent space through a variational autoencoder (VAE), making generation significantly more efficient than pixel-space diffusion models.
Architecture Evolution
SDXL introduced in July 2023 a 3.5 billion parameter UNet backbone with a dual text encoder pipeline (OpenCLIP ViT-bigG and CLIP ViT-L). It features a two-stage process: a base model generates initial latents and a separate refiner model adds high-quality details through SDEdit. SDXL natively supports 1024x1024 resolution and multiple aspect ratios via size and crop conditioning.
SD 3.0 released in February 2024 marked a fundamental architectural shift from the traditional UNet to a Multimodal Diffusion Transformer (MMDiT) using rectified flow. It employs three text encoders in parallel (CLIP-G/14, CLIP-L/14, T5-XXL) totaling 5 billion parameters dedicated to text understanding.
SD 3.5 released in October 2024 addressed SD 3.0's shortcomings with refined architecture and training. It excels in prompt adherence, image quality, typography, and diverse output styles. The MMDiT architecture with QK-normalization improved training stability and generation consistency.
Model Variants
SD 3.5 Large at 8.1 billion parameters is the flagship model offering superior quality and prompt adherence at 1 megapixel resolution. It requires approximately 18GB VRAM for inference on RTX 4090-class hardware. SD 3.5 Large Turbo is a distilled version generating high-quality images in just 4 sampling steps. SD 3.5 Medium at 2.5 billion parameters is designed for consumer hardware requiring only 9.9GB VRAM and supports resolutions from 0.25 to 2 megapixels.
Key Capabilities
Text-to-image generation creates images from natural language prompts handling complex multi-subject scenes and spatial relationships. Image-to-image transforms existing images based on prompt guidance and denoising strength. Inpainting and outpainting provide context-aware filling or extension of images beyond original boundaries. ControlNet integration offers precise composition control using canny edges, depth maps, and pose estimation.
Use Cases
Stable Diffusion serves creative design and art for concept art and illustration, marketing and advertising for product visuals and campaign creatives, game development for concept art and texture creation, e-commerce for product photography and lifestyle imagery, and research and education for generative model studies.
Licensing and Ecosystem
SD 3.5 is released under the Stability AI Community License. It is free for non-commercial use, scientific research, and commercial use by organizations with under $1 million in total annual revenue. Generated output images belong to the user and can be used commercially under all license tiers. The open-source ecosystem includes popular interfaces like ComfyUI, Automatic1111 Web UI, and Fooocus. The community has produced thousands of LoRAs, textual inversions, and fine-tuned checkpoints on platforms like Civitai and Hugging Face.
Tổng quan công cụ
Bảng giá
Công cụ AI tương tự
Stability AI Developer Platform
Stability AI is a developer platform for building image, video, audio, and 3D applications with APIs, sandbox tools, and credit-based pricing.
Clipchamp
Microsoft AI-powered online video editor for creating, editing, and sharing HD videos with no expertise required.
ChatGPT Code Interpreter
OpenAI sandboxed Python environment within ChatGPT that executes code, analyzes data, creates visualizations, and processes files through natural language conversations.
ParseHub Web Scraper
ParseHub is a powerful visual web scraping tool that extracts data from any website without writing code. It handles JavaScript, AJAX, pagination, and login forms, making it suitable for data analysts, marketers, researchers, and developers who need structured web data for lead generation, price monitoring, market intelligence, and data science workflows.
Rafter
Scan GitHub repositories for security vulnerabilities, secrets, and code issues with AI-powered SAST and actionable fix suggestions. Rafter connects to your GitHub with one click, delivers severity-tagged findings with plain-English remediation steps, and integrates with Claude Code, Cursor, and other AI coding agents.





