Known Issues

Found any bugs? Please let us know by submitting a GitHub issue.

Non-deterministic - Each processing run may produce slightly different results due to how LLM’s work.
Field Stability - While core identifiers (ISBN, DOI, LOC) remain consistent, interpretative fields like author names, subtitles, years, and publisher details can vary between runs. What are the best use cases?

PDF Writing - May produce unexpected metadata changes or file corruption. See Evaluation for testing strategies to avoid unexpected results.
Filename Length - Limited to 255 characters to maintain compatibility across different operating systems. Currently tested on macOS only. Windows and Linux support is experimental - please report any issues on GitHub.
Unicode Support - Special characters in filenames can trigger issues on certain operating systems.
File Validation - Source PDF corruption scanning is not yet implemented.
Language Support - Primary language support is English. Limited support for non-Latin character sets and right-to-left languages. Arabic, Chinese, Japanese and Korean text may produce inconsistent results.

Confidence Floor - Text segments with OCR confidence below 30% are automatically discarded. This is a subjective threshold and you’ll need to experiment with your own documents to find the best balance.
Limited Coverage - Vision analysis is selective, only processing the cover page, high-image-content pages (>90%), early pages with poor OCR, and mixed-content layouts.

Edition Filtering - First editions are automatically discarded from naming. This may not be appropriate for all use cases. A configurable setting is planned for the next release.
Metadata Transfer - Carrying over existing PDF metadata to the new file is not yet implemented.

Test Coverage - Limited test suite focusing mainly on happy paths. Edge cases and error conditions need more coverage. See Test Cases for current scenarios.
Real Test Files - Need to create actual PDF files for each test case. For example, books with many ISBN formats and various edition formats. If you have specific documents please submit a discussion. Setting up a benchmark would be helpful. Feedback and ideas for this are greatly appreciated!

See the Roadmap for planned improvements to these limitations.