Overview

As you experiment, you will likely find edges cases specific to your PDF collection. You can use the following settings to dig deeper into each run.

Ignoring Files and Directories at Runtime

Within the [project-root]/data directory tree, any directory named .runtime-ignore and its entire subtree will be skipped during processing. This is useful for troubleshooting and moving nested directories around without breaking the hierarchy.

A leading . to hide the directory from the file system has no effect on processing.

You must set a value, but you don’t need to create the folder until you need it.

See a sample directory listing with nested .runtime-ignore directories

RUNTIME_IGNORE_DIR_NAME
string
default:".runtime-ignore"
required

Diagnostic Folder

The directory name for storing log data and diagnostic files. Relative to the project root directory.

Example directory listing described on the Evaluation page.

DIAGNOSTIC_FOLDER
string
default:"logs"
required

Save Diagnostic Files

OCR is a complex process that can sometimes produce unexpected results. Enable saving diagnostic files to help troubleshoot:

For each page in the document, the following files are saved:

  • the original image, as found in the source PDF
  • the enhanced image, after processing, to make it easier for OCR to read
  • the unstructured OCR results, as a plain text file
  • the structured OCR results, as a markdown file
SAVE_DIAGNOSTIC_FILES
boolean
default:"true"

For saved diagnostic images. PNG preferred for maintaining sharp detail.

IMAGE_FORMAT
string
default:"PNG"

AI Request and Response Logging

Controls how AI prompts and responses are handled in logs. These are never shown in the terminal output, only in log files. Applies to both:

  • OpenAI filename and metadata generation
  • Ollama vision analysis
INCLUDE_RAW_AI_REQUEST_IN_LOG
boolean
default:"true"

Whether to include raw AI requests (prompts) in log files.

INCLUDE_RAW_AI_RESPONSE_IN_LOG
boolean
default:"true"

Whether to include raw AI responses in log files.

Terminal Colors

Running the pipeline with VERBOSE-TERM gives rich output in the terminal. This gives fine grained view into the progress and potential failures.

These standard ANSI color codes are used to make the output more readable. Modify these to change the color scheme.

CYAN
string
default:"\\033[96m"
GREEN
string
default:"\\033[92m"
YELLOW
string
default:"\\033[93m"
BLUE
string
default:"\\033[94m"
RESET
string
default:"\\033[0m"