RAG solves a real LLM limitation: training data has a cutoff date, and most companies' internal documents are never in training data at all. Instead of retraining the model (slow, expensive), RAG keeps your documents in a vector database (Pinecone, Weaviate, Postgres pgvector), embeds the user's question into the same vector space, finds the most relevant chunks, and feeds those chunks to the LLM as additional context. The model then answers based on those chunks. Result: an LLM that can answer questions about your specific docs, with citations, without expensive retraining. RAG dominates enterprise AI in 2026 — almost every 'chat with your PDFs' or internal-knowledge bot is RAG under the hood. The hard parts: choosing good chunk sizes, getting embeddings to actually match user intent, and managing the latency of retrieve-then-generate flows.
מילון
מה זה RAG (Retrieval-Augmented Generation)?
A technique that lets an LLM answer questions about information it wasn't trained on — by retrieving relevant documents at query time and stuffing them into the prompt.
מונחים קשורים
Embedding
A vector representation of text, image, or audio — a list of numbers that captures the semantic meaning, enabling 'find similar' searches.
Vector Database
A database optimized for storing and searching embeddings — finding the nearest vectors to a query vector in milliseconds, even across billions of records.
LLM (Large Language Model)
An AI system trained on massive text datasets to predict and generate human-like text — the technology behind ChatGPT, Claude, Gemini, and most modern AI chatbots.
Context Window
The maximum number of tokens an LLM can process in one interaction — including your prompt, conversation history, and the model's response.
חזרה ל- מילון ה-AI