What is AI Alignment? — AI Glossary

Alignment is the broad research program of making AI systems do what we want. Narrow alignment: getting a chatbot to refuse harmful requests, avoid hallucinations, stay on-topic. Broad alignment: ensuring future, more powerful AI systems pursue goals that match human flourishing, don't acquire dangerous instrumental goals (power, self-preservation, resource accumulation), and remain corrigible (we can correct them). The field uses techniques like RLHF, constitutional AI, interpretability, red-teaming, and capability evaluations. The major AI labs (Anthropic, OpenAI, Google DeepMind) all have alignment teams. By 2026, narrow alignment is largely solved for current-generation models; broad alignment for future, more capable systems remains an open research problem.

RELATED TERMS