Skip to main content

Overview

As you experiment, you will likely find edges cases specific to your PDF collection. You can use the following settings to dig deeper into each run.

Ignoring Files and Directories at Runtime

Within the [project-root]/data directory tree, any directory named .runtime-ignore and its entire subtree will be skipped during processing. This is useful for troubleshooting and moving nested directories around without breaking the hierarchy. A leading . to hide the directory from the file system has no effect on processing. You must set a value, but you don’t need to create the folder until you need it. See a sample directory listing with nested .runtime-ignore directories
RUNTIME_IGNORE_DIR_NAME
string
default:".runtime-ignore"
required

Diagnostic Folder

The directory name for storing log data and diagnostic files. Relative to the project root directory. Example directory listing described on the Evaluation page.
DIAGNOSTIC_FOLDER
string
default:"logs"
required

Save Diagnostic Files

OCR is a complex process that can sometimes produce unexpected results. Enable saving diagnostic files to help troubleshoot: For each page in the document, diagnostic files can be saved: Images:
  • the original image, as found in the source PDF
  • the enhanced image, after processing, to make it easier for OCR to read
Text files:
  • the unstructured OCR results, as a plain text file
  • the structured OCR results, as a markdown file
  • vision analysis results (when local VLM is enabled)
Default leaves both off so routine runs stay light on disk usage. Toggle them on when you need per-page artifacts for troubleshooting OCR quality or layout detection.
SAVE_DIAG_PER_PAGE_IMG
boolean
default:"false"
Whether to save diagnostic images (original and enhanced) for each page.
SAVE_DIAG_TXT_PER_PG
boolean
default:"false"
Whether to save diagnostic text files (OCR results, structured markdown, vision analysis) for each page.
For saved diagnostic images. PNG preferred for maintaining sharp detail.
IMAGE_FORMAT
string
default:"PNG"

AI Request and Response Logging

Controls how AI prompts and responses are handled in logs. These are never shown in the terminal output, only in log files. Applies to both:
  • OpenAI filename and metadata generation
  • Ollama vision analysis
INCLUDE_RAW_AI_REQUEST_IN_LOG
boolean
default:"true"
Whether to include raw AI requests (prompts) in log files.
INCLUDE_RAW_AI_RESPONSE_IN_LOG
boolean
default:"true"
Whether to include raw AI responses in log files.

Terminal Colors

Running the pipeline with VERBOSE-TERM gives rich output in the terminal. This gives fine grained view into the progress and potential failures. These standard ANSI color codes are used to make the output more readable. Modify these to change the color scheme.
CYAN
string
default:"\\033[96m"
GREEN
string
default:"\\033[92m"
YELLOW
string
default:"\\033[93m"
BLUE
string
default:"\\033[94m"
RESET
string
default:"\\033[0m"