Configuration

MODEL_NAME
string
default:"gpt-4o-mini"
required

The OpenAI model used for metadata extraction and filename generation.

OpenAI offers many models for different tasks. Our pipeline requires a model supporting Structured Output, which steers LLM responses to our schema.

Supported models include:

  • gpt-4o
  • gpt-4o-mini
  • o3-mini
  • o1
  • gpt-4.5-preview
WARM_SCHEMA_CACHE
boolean
default:"false"
required

Whether to pre-warm the OpenAI schema cache to avoid initial delay. When enabled, makes a minimal request to cache the schema before processing files. This can help avoid the initial 10-second delay on the first request with a new schema.

CGPT_TEMPERATURE
number
default:"0.1"
required

Controls response randomness/creativity. Range 0 to 1. Lower values are better for this project’s use case.

MAX_AI_INPUT_CHARS
number
default:"16000"
required

Maximum number of characters to send to the AI model. Default covers 8 pages of front matter (at ~2,000 chars/page) where most document metadata is found

MAX_AI_OUTPUT_TOKENS
number
default:"1500"
required

Maximum number of tokens the AI model can generate. This needs to cover the JSON schema, the metadata, filename and its reasoning.

API_PRICING
object
required

Determines how the token costs summary is calculated.

gpt-4o-mini
object
required
API_TIMEOUT_SECONDS
number
default:"20"
required

API timeout in seconds.

API_MAX_RETRIES
number
default:"2"
required

How many times to retry on timeout. Sometimes models are slow to respond or are busy.

API_RETRY_DELAY
number
default:"2"
required

Seconds between retries.

FAQ

What’s the difference between MAX_AI_INPUT_CHARS and MAX_AI_OUTPUT_TOKENS?

These settings control different aspects of the AI processing pipeline:

  • MAX_AI_INPUT_CHARS limits the number of characters from OCR text that are sent to the AI model. This is a pre-processing step that truncates the raw text input before it’s sent to the API. It’s measured in characters (letters, numbers, spaces, etc.).

  • MAX_AI_OUTPUT_TOKENS limits how many tokens the AI model can generate in its response. This is a direct setting to the OpenAI API and affects the maximum length of the model’s output. It’s measured in tokens, which are the units OpenAI uses for processing and billing. Read more about token costs.

The distinction is important because:

  1. They operate at different stages of processing (input preparation vs. API response)
  2. They use different units of measurement (characters vs. tokens)
  3. They help control costs in different ways (reducing input data vs. limiting output generation)

Having there values defined is important in batch processing environments as a small mistake can lead to unexpected and expensive costs.

Why are my API responses taking 10+ seconds?

Response times for OpenAI models can vary based on several factors:

  • Model complexity: More capable models like gpt-4o typically take longer than smaller models
  • Input size: Larger text inputs require more processing time
  • Server load: OpenAI’s infrastructure experiences varying load throughout the day
  • Structured output parsing: Using the structured output feature adds processing time
  • Network conditions: Latency between your application and OpenAI’s servers

For gpt-4o-mini, typical response times should be 2-5 seconds. If you’re consistently seeing 10+ seconds:

  1. Try reducing your MAX_AI_INPUT_CHARS value
  2. Consider a different model if speed is critical
  3. Ensure your network connection to OpenAI’s API is stable

The retry mechanism ( API_MAX_RETRIES and
API_RETRY_DELAY) handles occasional timeouts, but consistently slow responses may require configuration adjustments.

MAX_AI_INPUT_CHARS

Critical setting that controls whether generated metadata and filenames are written to PDFs in [project-root]/data/ which are not in a .