Diagnostic
Settings for diagnosing and troubleshooting issues
Overview
As you experiment, you will likely find edges cases specific to your PDF collection. You can use the following settings to dig deeper into each run.
Ignoring Files and Directories at Runtime
Within the [project-root]/data
directory tree, any directory named .runtime-ignore
and
its entire subtree will be skipped during processing. This is useful for troubleshooting
and moving nested directories around without breaking the hierarchy.
A leading .
to hide the directory from the file system has no effect on processing.
You must set a value, but you don’t need to create the folder until you need it.
See a sample
directory listing with nested .runtime-ignore
directories
Diagnostic Folder
The directory name for storing log data and diagnostic files. Relative to the project root directory.
Example directory listing described on the Evaluation page.
Save Diagnostic Files
OCR is a complex process that can sometimes produce unexpected results. Enable saving diagnostic files to help troubleshoot:
From Tesseract docs: Improving the quality of the output
For each page in the document, the following files are saved:
- the original image, as found in the source PDF
- the enhanced image, after processing, to make it easier for OCR to read
- the unstructured OCR results, as a plain text file
- the structured OCR results, as a markdown file
For saved diagnostic images. PNG preferred for maintaining sharp detail.
AI Request and Response Logging
Controls how AI prompts and responses are handled in logs. These are never shown in the terminal output, only in log files. Applies to both:
- OpenAI filename and metadata generation
- Ollama vision analysis
Whether to include raw AI requests (prompts) in log files.
Whether to include raw AI responses in log files.
Terminal Colors
Running the pipeline with VERBOSE-TERM
gives rich output in the terminal. This gives
fine grained view into the progress and potential failures.
These standard ANSI color codes are used to make the output more readable. Modify these to change the color scheme.