Token Based Billing Explained
10/15/2025 • 3 min read
Most of the AI we use on a daily basis comes in the form of text generated by a large language model (LLM), like gpt-5. Behind the scenes, these LLMs generate the text in small chunks at a time, stringing these chunks together to produce a final text response. These chunks are called tokens. LLMs also break your prompts into tokens before they read them. Tokens are the inputs and outputs to LLMs, and are the fundamental units of the language they understand.
What Are Tokens?
Think of tokens as parts of words.
Play around with the tokenizer playground below to see how text input is interpreted by LLMs.
How Product Companies Pay For AI
If you're not a programmer, you likely pay for AI products through a subscription model rather than per token. This simplifies billing by providing access to a set amount of usage each month.
Software developers, on the other hand, typically access AI capabilities by sending direct requests to providers like OpenAI, Anthropic, or Google through their APIs, where billing is often calculated based on token usage (input tokens + output tokens).
Products that ask a 3rd party AI provider to read or generate tokens get charged per token.
The more tokens an LLM reads or generates, the more computational work is required, and the more you pay.
This is consumption-based pricing, as opposed to subscription-based pricing.
If your team is building a product that offers AI generated responses to customers, the more they use, the more you pay.
Unlike most traditional software, the cost scales with how much your customers use it.
If your users pay a fixed rate for your product, the more AI they use, the lower your margins. Consider implementing limits on AI usage, or consumption-based pricing (above a limit).
How AI Providers Pay For AI
Model providers, like OpenAI, run lots of computers that listen for user requests, then perform a mathematical calculation to generate response tokens.
Serving tokens consumes power across GPUs, networking, and memory.
The minimum operating cost for a model provider is the cost of electricity used to perform that calculation. Ultimately, the cost of electrical power used per token determines their margins. If the cost of power used per token is more than they charge per token, they operate at a loss.
Multimedia Inputs Are Still Tokens
Interestingly, multimedia (images, video, audio) isn't exempt from this token-based billing system (at least when using OpenAI). When you feed images to multimodal models or process conversational audio, these inputs are converted into token equivalents.
How images are converted to text token inputs varies based on the model.
A 1024x1024 image might consume 1024 tokens — sometimes equivalent to several pages of text.
Play around with the image tokenizer below to see how you might get charged for letting your customers process these kind of inputs.
Example Image Tokenisation
- Raw patches: ceil(1024/32)×ceil(1024/32) = 1,024
- No downscale needed (<= 1536 patches)
- Resized: 1024 × 1024 (scale ×1.000)
- Patches used: ceil(1024/32)×ceil(1024/32) = 1,024
- Patch cap: 1536
Conclusion
If you're offering AI responses or analysis as part of your product, consider:
- Usage limits -- How much can you your customers use your AI?
- Consumption-based pricing -- Should you charge users more for using more AI?
Key Takeaways
- Tokens are the atomic unit of work done by LLMs (for both understanding inputs, and generating outputs).
- API usage is billed per token; subscriptions abstract token limits.
- Electricity and hardware utilisation drive the baseline cost per token.
- Images/video/audio are converted to token equivalents for pricing.
When planning budgets or pricing products that rely on LLMs, reason in tokens first — it maps best to both technical limits and cost.