تخطي إلى المحتوى الرئيسي

قاموس

ما هو Token؟

The basic unit that LLMs read and produce. Roughly 0.75 words in English. APIs charge per token consumed and produced.

Tokens are how LLMs break up text. The word 'understanding' might be one token; 'antidisestablishmentarianism' might be five. A rule of thumb for English: 1 token ≈ 0.75 words, or 4 characters. Different models use different tokenizers, so token counts vary. Why tokens matter: (1) costs — API pricing is usually per million tokens; (2) context windows — models have a maximum token count they can handle at once; (3) speed — inference time scales with output tokens. When estimating costs, count both input AND output tokens. A 1000-word document going through a model that produces a 500-word summary uses about 2000 tokens total. At GPT-4.1 pricing (~$2.50/M input, $10/M output), that costs less than a penny — but at scale (millions of documents) it adds up fast.

مصطلحات ذات صلة

العودة إلى قاموس AI