Found any bugs? Please let us know by submitting a GitHub issue.

LLM Processing

  • Non-deterministic - Each processing run may produce slightly different results due to how LLM’s work.

  • Field Stability - While core identifiers (ISBN, DOI, LOC) remain consistent, interpretative fields like author names, subtitles, years, and publisher details can vary between runs. What are the best use cases?

File Operations

  • PDF Writing - May produce unexpected metadata changes or file corruption. See Evaluation for testing strategies to avoid unexpected results.

  • Filename Length - Limited to 255 characters to maintain compatibility across different operating systems. Currently tested on macOS only. Windows and Linux support is experimental - please report any issues on GitHub.

  • Unicode Support - Special characters in filenames can trigger issues on certain operating systems.

  • File Validation - Source PDF corruption scanning is not yet implemented.

  • Language Support - Primary language support is English. Limited support for non-Latin character sets and right-to-left languages. Arabic, Chinese, Japanese and Korean text may produce inconsistent results.

OCR and Vision Analysis

  • Confidence Floor - Text segments with OCR confidence below 30% are automatically discarded. This is a subjective threshold and you’ll need to experiment with your own documents to find the best balance.
  • Limited Coverage - Vision analysis is selective, only processing the cover page, high-image-content pages (>90%), early pages with poor OCR, and mixed-content layouts.

Metadata Extraction

  • Edition Filtering - First editions are automatically discarded from naming. This may not be appropriate for all use cases. A configurable setting is planned for the next release.
  • Metadata Transfer - Carrying over existing PDF metadata to the new file is not yet implemented.

Testing Limitations

  • Test Coverage - Limited test suite focusing mainly on happy paths. Edge cases and error conditions need more coverage. See Test Cases for current scenarios.

  • Real Test Files - Need to create actual PDF files for each test case. For example, books with many ISBN formats and various edition formats. If you have specific documents please submit a discussion.

    Setting up a benchmark would be helpful. Feedback and ideas for this are greatly appreciated!

See the Roadmap for planned improvements to these limitations.