Token Costs

What are tokens?

Tokens are the smallest units of text or data that an AI model can process. They can be words, characters, phrases, or even image segments.

Here’s a sample for a paragraph of text:

Token cost sample, from OpenAI Tokenizer

How are token costs calculated?

Our script composes extracted data points into prompts for processing with OpenAI. API costs are calculated per million tokens. Costs vary by model and generally decrease over time as new models are released and competition drives down costs.

We track sent and received tokens with the official OpenAI Tokenizer tiktoken package. We then apply the API_PRICING you set for the selected MODEL_NAME.

We need to calculate these costs manually because OpenAI does not currently allow programmatic access to pricing information or actual per-request/response billed costs.

See the References section for more information.

Input Token Estimates

Here’s what to expect for token usage:

System message: ~100 tokens (rules and instructions), defined in prompt.py
Schema: ~300-400 tokens (complex nested JSON structure), defined in schemas.py
PDF metadata: 50-200 tokens (varies by document), extracted from the PDF
OCR text: ~2,000 per page
Vision analysis: 100-300 tokens, when enabled

The input text character limit is capped by the MAX_AI_INPUT_CHARS setting.

Output Token Estimates

Metadata fields: ~100 tokens
Value, confidence, sources: ~100 tokens
Decisions reasoning object: ~100 tokens
JSON structure overhead: ~100 tokens

We cap output tokens with the MAX_AI_OUTPUT_TOKENS setting.

Reviewing Estimate After Running

Token counts and costs are written to:

the terminal
the log file
the stats.txt file

This is not the actual costs you will be billed, but it’s a rough estimate for our purposes. Always review the actual costs on your OpenAI Platform account when running the script for a large batch of files.

Getting Started

Key Concepts

Configuration

Analysis & Iteration

Project

What are tokens?

How are token costs calculated?

Input Token Estimates

Output Token Estimates

Reviewing Estimate After Running

References

Getting Started

Key Concepts

Configuration

Analysis & Iteration

Project

​What are tokens?

​How are token costs calculated?

​Input Token Estimates

​Output Token Estimates

​Reviewing Estimate After Running

​References

What are tokens?

How are token costs calculated?

Input Token Estimates

Output Token Estimates

Reviewing Estimate After Running

References