Variational Autoencoders (VAEs)
Generative AI (GenAI)What is Variational Autoencoders (VAEs)?
Variational autoencoders are a type of generative AI model that learns to create new data by first compressing existing examples into a compact mathematical representation and then expanding that representation back into new, realistic outputs. Imagine compressing a photo into a small set of numbers that capture its essential features (the lighting, color palette, the position of objects) and then tweaking those numbers slightly to generate a brand new photo with different characteristics. VAEs work exactly this way: they learn a smooth, organized 'map' of all possible variations in their training data, and you can generate new content by picking points on that map. Unlike GANs, which use a competitive process, VAEs use a probabilistic approach rooted in Bayesian statistics. They are a foundational component of modern image generation. The 'latent diffusion' technique used by Stable Diffusion operates in a compressed space created by a VAE encoder, combining the best of both approaches.
Technical Deep Dive
Variational autoencoders (VAEs), introduced by Kingma and Welling (2013), are probabilistic generative models that learn latent representations by jointly training an encoder (inference network) that maps inputs to a distribution in latent space and a decoder (generative network) that reconstructs inputs from latent samples. The model maximizes the evidence lower bound (ELBO), which balances reconstruction fidelity with KL divergence from a prior distribution (typically standard Gaussian). The reparameterization trick enables gradient-based optimization through the stochastic sampling step. Key variants include conditional VAEs (class-conditioned generation), beta-VAE (disentangled representations via KL weight tuning), VQ-VAE (vector-quantized discrete latent spaces), and hierarchical VAEs (multi-scale latent hierarchies). VAEs produce smoother but sometimes blurrier outputs compared to GANs. Their most impactful modern application is as the compression backbone in latent diffusion models (Stable Diffusion), where the VAE encoder maps images to a compact latent space where diffusion operates efficiently, and the VAE decoder reconstructs final images from denoised latents.
Why It Matters
VAEs are a key component of Stable Diffusion's image generation pipeline, enable drug discovery by generating novel molecular structures, and power face editing features that let you smoothly modify attributes like age and expression in photos.
Related Concepts
Part of
- Generative AI (GenAI) (includes)
- Autoencoders (extends to)
Connected to
- Autoencoders (extends to)