OpenAI
For metadata and filename generation
Configuration
The OpenAI model used for metadata extraction and filename generation.
OpenAI offers many models for different tasks. Our pipeline requires a model supporting Structured Output, which steers LLM responses to our schema.
Supported models include:
gpt-4o
gpt-4o-mini
o3-mini
o1
gpt-4.5-preview
Whether to pre-warm the OpenAI schema cache to avoid initial delay. When enabled, makes a minimal request to cache the schema before processing files. This can help avoid the initial 10-second delay on the first request with a new schema.
Controls response randomness/creativity. Range 0
to 1
. Lower values are better for
this project’s use case.
Maximum number of characters to send to the AI model. Default covers 8 pages of front matter (at ~2,000 chars/page) where most document metadata is found
Maximum number of tokens the AI model can generate. This needs to cover the JSON schema, the metadata, filename and its reasoning.
Determines how the token costs summary is calculated.
API timeout in seconds.
How many times to retry on timeout. Sometimes models are slow to respond or are busy.
Seconds between retries.
FAQ
What’s the difference between MAX_AI_INPUT_CHARS
and MAX_AI_OUTPUT_TOKENS
?
These settings control different aspects of the AI processing pipeline:
-
MAX_AI_INPUT_CHARS
limits the number of characters from OCR text that are sent to the AI model. This is a pre-processing step that truncates the raw text input before it’s sent to the API. It’s measured in characters (letters, numbers, spaces, etc.). -
MAX_AI_OUTPUT_TOKENS
limits how many tokens the AI model can generate in its response. This is a direct setting to the OpenAI API and affects the maximum length of the model’s output. It’s measured in tokens, which are the units OpenAI uses for processing and billing. Read more about token costs.
The distinction is important because:
- They operate at different stages of processing (input preparation vs. API response)
- They use different units of measurement (characters vs. tokens)
- They help control costs in different ways (reducing input data vs. limiting output generation)
Having there values defined is important in batch processing environments as a small mistake can lead to unexpected and expensive costs.
Why are my API responses taking 10+ seconds?
Response times for OpenAI models can vary based on several factors:
- Model complexity: More capable models like
gpt-4o
typically take longer than smaller models - Input size: Larger text inputs require more processing time
- Server load: OpenAI’s infrastructure experiences varying load throughout the day
- Structured output parsing: Using the structured output feature adds processing time
- Network conditions: Latency between your application and OpenAI’s servers
For gpt-4o-mini
, typical response times should be 2-5 seconds. If you’re consistently
seeing 10+ seconds:
- Try reducing your MAX_AI_INPUT_CHARS value
- Consider a different model if speed is critical
- Ensure your network connection to OpenAI’s API is stable
The retry mechanism ( API_MAX_RETRIES and
API_RETRY_DELAY) handles occasional timeouts,
but consistently slow responses may require configuration adjustments.
MAX_AI_INPUT_CHARS
Critical setting that controls whether generated metadata and filenames are written to
PDFs in [project-root]/data/
which are not in a .