Alignment is the broad research program of making AI systems do what we want. Narrow alignment: getting a chatbot to refuse harmful requests, avoid hallucinations, stay on-topic. Broad alignment: ensuring future, more powerful AI systems pursue goals that match human flourishing, don't acquire dangerous instrumental goals (power, self-preservation, resource accumulation), and remain corrigible (we can correct them). The field uses techniques like RLHF, constitutional AI, interpretability, red-teaming, and capability evaluations. The major AI labs (Anthropic, OpenAI, Google DeepMind) all have alignment teams. By 2026, narrow alignment is largely solved for current-generation models; broad alignment for future, more capable systems remains an open research problem.
GLOSSARY
What is AI Alignment?
The discipline of ensuring AI systems behave in ways that match human values and intentions — both in safety (don't cause harm) and in usefulness.
RELATED TERMS
RLHF (Reinforcement Learning from Human Feedback)
The technique that turns a base LLM into a useful assistant — by having humans rate model responses and using that feedback to fine-tune behavior.
LLM (Large Language Model)
An AI system trained on massive text datasets to predict and generate human-like text — the technology behind ChatGPT, Claude, Gemini, and most modern AI chatbots.
Back to the AI Glossary