OpenAI

Configuration

MODEL_NAME

string

default:"gpt-4o-mini"

required

The OpenAI model used for metadata extraction and filename generation.OpenAI offers many models for different tasks. Our pipeline requires a model supporting Structured Output, which steers LLM responses to our schema.Supported models include:

gpt-4o
gpt-4o-mini
o3-mini
o1
gpt-4.5-preview

WARM_SCHEMA_CACHE

boolean

default:"false"

required

Whether to pre-warm the OpenAI schema cache to avoid initial delay. When enabled, makes a minimal request to cache the schema before processing files. This can help avoid the initial 10-second delay on the first request with a new schema.

CGPT_TEMPERATURE

number

default:"0.1"

required

Controls response randomness/creativity. Range 0 to 1. Lower values are better for this project’s use case.

MAX_AI_INPUT_CHARS

number

default:"16000"

required

Maximum number of characters to send to the AI model. Default covers 8 pages of front matter (at ~2,000 chars/page) where most document metadata is found

MAX_AI_OUTPUT_TOKENS

number

default:"1500"

required

Maximum number of tokens the AI model can generate. This needs to cover the JSON schema, the metadata, filename and its reasoning.

API_PRICING

object

required

Determines how the token costs summary is calculated.

Hide Properties

gpt-4o-mini

object

required

Show Properties

input

number

default:"0.00000015"

required

$0.15 / 1M tokens

output

number

default:"0.0000006"

required

$0.60 / 1M tokens

API_TIMEOUT_SECONDS

number

default:"20"

required

API timeout in seconds.

API_MAX_RETRIES

number

default:"2"

required

How many times to retry on timeout. Sometimes models are slow to respond or are busy.

API_RETRY_DELAY

number

default:"2"

required

Seconds between retries.

FAQ

What’s the difference between `MAX_AI_INPUT_CHARS` and `MAX_AI_OUTPUT_TOKENS`?

These settings control different aspects of the AI processing pipeline:

MAX_AI_INPUT_CHARS limits the number of characters from OCR text that are sent to the AI model. This is a pre-processing step that truncates the raw text input before it’s sent to the API. It’s measured in characters (letters, numbers, spaces, etc.).
MAX_AI_OUTPUT_TOKENS limits how many tokens the AI model can generate in its response. This is a direct setting to the OpenAI API and affects the maximum length of the model’s output. It’s measured in tokens, which are the units OpenAI uses for processing and billing. Read more about token costs.

The distinction is important because:

They operate at different stages of processing (input preparation vs. API response)
They use different units of measurement (characters vs. tokens)
They help control costs in different ways (reducing input data vs. limiting output generation)

Having there values defined is important in batch processing environments as a small mistake can lead to unexpected and expensive costs.

Why are my API responses taking 10+ seconds?

Response times for OpenAI models can vary based on several factors:

Model complexity: More capable models like gpt-4o typically take longer than smaller models
Input size: Larger text inputs require more processing time
Server load: OpenAI’s infrastructure experiences varying load throughout the day
Structured output parsing: Using the structured output feature adds processing time
Network conditions: Latency between your application and OpenAI’s servers

For gpt-4o-mini, typical response times should be 2-5 seconds. If you’re consistently seeing 10+ seconds:

Try reducing your MAX_AI_INPUT_CHARS value
Consider a different model if speed is critical
Ensure your network connection to OpenAI’s API is stable

The retry mechanism ( API_MAX_RETRIES and
API_RETRY_DELAY) handles occasional timeouts, but consistently slow responses may require configuration adjustments. MAX_AI_INPUT_CHARS Critical setting that controls whether generated metadata and filenames are written to PDFs in [project-root]/data/ which are not in a .

Getting Started

Key Concepts

Configuration

Analysis & Iteration

Project

Configuration

FAQ

What’s the difference between `MAX_AI_INPUT_CHARS` and `MAX_AI_OUTPUT_TOKENS`?

Why are my API responses taking 10+ seconds?

Getting Started

Key Concepts

Configuration

Analysis & Iteration

Project

​Configuration

​FAQ

​What’s the difference between MAX_AI_INPUT_CHARS and MAX_AI_OUTPUT_TOKENS?

​Why are my API responses taking 10+ seconds?

Configuration

FAQ

What’s the difference between `MAX_AI_INPUT_CHARS` and `MAX_AI_OUTPUT_TOKENS`?

Why are my API responses taking 10+ seconds?