What is Inference? — AI Glossary

In AI, 'training' is the expensive one-time process of teaching the model; 'inference' is the cheap (per call) repeated process of using it. For LLMs, inference cost is paid in tokens (input + output) and varies by model: GPT-4 costs ~$10/M output tokens, smaller models cost cents. Inference latency depends on the model and the request — small models respond in 100ms, large models in 1-10 seconds. Most consumer AI costs are inference costs: a chatbot serving 1M users a day costs vastly more in inference than the one-time training. Inference optimization (quantization, distillation, KV-cache, batching) is the hottest area in AI engineering in 2026.

מה זה Inference?

מונחים קשורים