Foundation Models
Deep Learning (DL)What is Foundation Models?
Foundation models are large AI models pretrained on vast amounts of data that serve as a starting point for many different tasks. Rather than training a separate model from scratch for each specific task (like translation, summarization, or question answering), a foundation model is trained once on a massive, diverse dataset and then adapted to specific tasks with relatively little additional effort. The term was coined by Stanford researchers in 2021 to describe this paradigm shift. BERT, GPT, and similar models are foundation models. They learn general language understanding during pretraining and can then be fine-tuned or prompted to perform hundreds of different tasks. This approach is economical because training one large model and adapting it is far cheaper than training hundreds of specialized models. Foundation models now exist for text, images, code, audio, video, proteins, and scientific data, fundamentally changing how AI applications are built.
Technical Deep Dive
Foundation models are large-scale pretrained models that serve as adaptable bases for a wide range of downstream tasks, a paradigm formalized by Bommasani et al. (Stanford HAI, 2021). They are characterized by self-supervised pretraining on broad, diverse datasets at scale, followed by adaptation via fine-tuning, prompt engineering, in-context learning, or retrieval augmentation. The paradigm exploits the emergent capabilities that arise from scale, meaning abilities not explicitly trained but present in sufficiently large models. Key examples include language models (BERT, GPT series, PaLM, Llama), vision models (CLIP, DINOv2, Segment Anything), code models (Codex, StarCoder), and multimodal models (Flamingo, GPT-4V). Adaptation techniques include full fine-tuning, parameter-efficient methods (LoRA, QLoRA, prefix tuning, adapters), and zero/few-shot prompting. The foundation model paradigm raises important questions about homogenization risk, training data governance, compute concentration, and the transfer of biases across downstream applications.
Why It Matters
Foundation models are the reason you can use one AI system like ChatGPT or Claude for writing, coding, analysis, translation, and hundreds of other tasks instead of needing a separate specialized tool for each one.
Examples
- BERT: Google's bidirectional encoder-only transformer (2018) pretrained with masked language modeling, revolutionizing NLP benchmarks and establishing the pretrain-then-fine-tune paradigm for language understanding tasks
Related Concepts
Part of
- Deep Learning (DL) (paradigms)
- Transformers (enables)
Includes
- Reasoning Models (enables)
- Large Language Models (LLM) (e.g.)
- Agentic AI (enables)
Connected to
- Transformers (enables)
- Reasoning Models (enables)
- Large Language Models (LLM) (e.g.)
- Agentic AI (enables)