Diagnostic

Overview

As you experiment, you will likely find edges cases specific to your PDF collection. You can use the following settings to dig deeper into each run.

Ignoring Files and Directories at Runtime

Within the [project-root]/data directory tree, any directory named .runtime-ignore and its entire subtree will be skipped during processing. This is useful for troubleshooting and moving nested directories around without breaking the hierarchy.

A leading . to hide the directory from the file system has no effect on processing.

You must set a value, but you don’t need to create the folder until you need it.

See a sample directory listing with nested .runtime-ignore directories

RUNTIME_IGNORE_DIR_NAME

string

default:".runtime-ignore"

required

Diagnostic Folder

The directory name for storing log data and diagnostic files. Relative to the project root directory.

Example directory listing described on the Evaluation page.

DIAGNOSTIC_FOLDER

string

default:"logs"

required

Save Diagnostic Files

OCR is a complex process that can sometimes produce unexpected results. Enable saving diagnostic files to help troubleshoot:

From Tesseract docs: Improving the quality of the output

For each page in the document, the following files are saved:

the original image, as found in the source PDF
the enhanced image, after processing, to make it easier for OCR to read
the unstructured OCR results, as a plain text file
the structured OCR results, as a markdown file

SAVE_DIAGNOSTIC_FILES

boolean

default:"true"

For saved diagnostic images. PNG preferred for maintaining sharp detail.

IMAGE_FORMAT

string

default:"PNG"

AI Request and Response Logging

Controls how AI prompts and responses are handled in logs. These are never shown in the terminal output, only in log files. Applies to both:

OpenAI filename and metadata generation
Ollama vision analysis

INCLUDE_RAW_AI_REQUEST_IN_LOG

boolean

default:"true"

Whether to include raw AI requests (prompts) in log files.

INCLUDE_RAW_AI_RESPONSE_IN_LOG

boolean

default:"true"

Whether to include raw AI responses in log files.

Terminal Colors

Running the pipeline with VERBOSE-TERM gives rich output in the terminal. This gives fine grained view into the progress and potential failures.

These standard ANSI color codes are used to make the output more readable. Modify these to change the color scheme.

CYAN

string

default:"\\033[96m"

GREEN

string

default:"\\033[92m"

YELLOW

string

default:"\\033[93m"

BLUE

string

default:"\\033[94m"

RESET

string

default:"\\033[0m"

Getting Started

Key Concepts

Configuration

Analysis & Iteration

Project

Overview

Ignoring Files and Directories at Runtime

Diagnostic Folder

Save Diagnostic Files

AI Request and Response Logging

Terminal Colors

Getting Started

Key Concepts

Configuration

Analysis & Iteration

Project

​Overview

​Ignoring Files and Directories at Runtime

​Diagnostic Folder

​Save Diagnostic Files

​AI Request and Response Logging

​Terminal Colors

Overview

Ignoring Files and Directories at Runtime

Diagnostic Folder

Save Diagnostic Files

AI Request and Response Logging

Terminal Colors