The context window is the LLM's working memory. Once a conversation or document exceeds the window, older content gets dropped or summarized — and the model effectively forgets it. Window sizes in 2026: GPT-4 Turbo (128K tokens), Claude (200K), Gemini Pro (2M). 200K tokens is about a 500-page book; 2M is enough to load a small codebase entirely. Larger windows enable new use cases (analyzing a whole codebase, conversation memory over months) but come with trade-offs: cost (longer prompts cost more), latency (longer prompts process slower), and an effect called 'lost in the middle' where models pay less attention to information buried mid-document. For most apps, RAG still beats large context windows — retrieve the relevant 5% rather than dump in 100%.
СЛОВАРЬ
Что такое Context Window?
The maximum number of tokens an LLM can process in one interaction — including your prompt, conversation history, and the model's response.
СВЯЗАННЫЕ ТЕРМИНЫ
Token
The basic unit that LLMs read and produce. Roughly 0.75 words in English. APIs charge per token consumed and produced.
LLM (Large Language Model)
An AI system trained on massive text datasets to predict and generate human-like text — the technology behind ChatGPT, Claude, Gemini, and most modern AI chatbots.
RAG (Retrieval-Augmented Generation)
A technique that lets an LLM answer questions about information it wasn't trained on — by retrieving relevant documents at query time and stuffing them into the prompt.
Назад к Словарь ИИ