Token Based Billing Explained

10/15/2025 • 3 min read

Most of the AI we use on a daily basis comes in the form of text generated by a large language model (LLM), like gpt-5. Behind the scenes, these LLMs generate the text in small chunks at a time, stringing these chunks together to produce a final text response. These chunks are called tokens. LLMs also break your prompts into tokens before they read them. Tokens are the inputs and outputs to LLMs, and are the fundamental units of the language they understand.

What Are Tokens?

Think of tokens as parts of words.

Play around with the tokenizer playground below to see how text input is interpreted by LLMs.

Input

How the text is split

What the LLM sees (0 tokens)

How Product Companies Pay For AI

If you're not a programmer, you likely pay for AI products through a subscription model rather than per token. This simplifies billing by providing access to a set amount of usage each month.

Software developers, on the other hand, typically access AI capabilities by sending direct requests to providers like OpenAI, Anthropic, or Google through their APIs, where billing is often calculated based on token usage (input tokens + output tokens).

Products that ask a 3rd party AI provider to read or generate tokens get charged per token.

The more tokens an LLM reads or generates, the more computational work is required, and the more you pay.

This is consumption-based pricing, as opposed to subscription-based pricing.

If your team is building a product that offers AI generated responses to customers, the more they use, the more you pay.

Unlike most traditional software, the cost scales with how much your customers use it.

If your users pay a fixed rate for your product, the more AI they use, the lower your margins. Consider implementing limits on AI usage, or consumption-based pricing (above a limit).

How AI Providers Pay For AI

Model providers, like OpenAI, run lots of computers that listen for user requests, then perform a mathematical calculation to generate response tokens.

Serving tokens consumes power across GPUs, networking, and memory.

The minimum operating cost for a model provider is the cost of electricity used to perform that calculation. Ultimately, the cost of electrical power used per token determines their margins. If the cost of power used per token is more than they charge per token, they operate at a loss.

Multimedia Inputs Are Still Tokens

Interestingly, multimedia (images, video, audio) isn't exempt from this token-based billing system (at least when using OpenAI). When you feed images to multimodal models or process conversational audio, these inputs are converted into token equivalents.

How images are converted to text token inputs varies based on the model.

A 1024x1024 image might consume 1024 tokens — sometimes equivalent to several pages of text.

Play around with the image tokenizer below to see how you might get charged for letting your customers process these kind of inputs.

Example Image Tokenisation

Image dimensions

Model

Computation

Raw patches: ceil(1024/32)×ceil(1024/32) = 1,024
No downscale needed (<= 1536 patches)
Resized: 1024 × 1024 (scale ×1.000)
Patches used: ceil(1024/32)×ceil(1024/32) = 1,024
Patch cap: 1536

Tokens

Image tokens1,024

Model multiplier (gpt-4.1-mini)× 1.62

Total tokens1,659

Conclusion

If you're offering AI responses or analysis as part of your product, consider:

Usage limits -- How much can you your customers use your AI?
Consumption-based pricing -- Should you charge users more for using more AI?

Key Takeaways

Tokens are the atomic unit of work done by LLMs (for both understanding inputs, and generating outputs).
API usage is billed per token; subscriptions abstract token limits.
Electricity and hardware utilisation drive the baseline cost per token.
Images/video/audio are converted to token equivalents for pricing.

When planning budgets or pricing products that rely on LLMs, reason in tokens first — it maps best to both technical limits and cost.