Settings for diagnosing and troubleshooting issues
As you experiment, you will likely find edges cases specific to your PDF collection. You can use the following settings to dig deeper into each run.
Within the [project-root]/data
directory tree, any directory named .runtime-ignore
and
its entire subtree will be skipped during processing. This is useful for troubleshooting
and moving nested directories around without breaking the hierarchy.
A leading .
to hide the directory from the file system has no effect on processing.
You must set a value, but you don’t need to create the folder until you need it.
See a sample
directory listing with nested .runtime-ignore
directories
The directory name for storing log data and diagnostic files. Relative to the project root directory.
Example directory listing described on the Evaluation page.
OCR is a complex process that can sometimes produce unexpected results. Enable saving diagnostic files to help troubleshoot:
From Tesseract docs: Improving the quality of the output
For each page in the document, the following files are saved:
For saved diagnostic images. PNG preferred for maintaining sharp detail.
Controls how AI prompts and responses are handled in logs. These are never shown in the terminal output, only in log files. Applies to both:
Whether to include raw AI requests (prompts) in log files.
Whether to include raw AI responses in log files.
Running the pipeline with VERBOSE-TERM
gives rich output in the terminal. This gives
fine grained view into the progress and potential failures.
These standard ANSI color codes are used to make the output more readable. Modify these to change the color scheme.