Known Issues
Current bugs and limitations
Found any bugs? Please let us know by submitting a GitHub issue.
LLM Processing
-
Non-deterministic - Each processing run may produce slightly different results due to how LLM’s work.
-
Field Stability - While core identifiers (ISBN, DOI, LOC) remain consistent, interpretative fields like author names, subtitles, years, and publisher details can vary between runs. What are the best use cases?
File Operations
-
PDF Writing - May produce unexpected metadata changes or file corruption. See Evaluation for testing strategies to avoid unexpected results.
-
Filename Length - Limited to 255 characters to maintain compatibility across different operating systems. Currently tested on macOS only. Windows and Linux support is experimental - please report any issues on GitHub.
-
Unicode Support - Special characters in filenames can trigger issues on certain operating systems.
-
File Validation - Source PDF corruption scanning is not yet implemented.
-
Language Support - Primary language support is English. Limited support for non-Latin character sets and right-to-left languages. Arabic, Chinese, Japanese and Korean text may produce inconsistent results.
OCR and Vision Analysis
- Confidence Floor - Text segments with OCR confidence below 30% are automatically discarded. This is a subjective threshold and you’ll need to experiment with your own documents to find the best balance.
- Limited Coverage - Vision analysis is selective, only processing the cover page, high-image-content pages (>90%), early pages with poor OCR, and mixed-content layouts.
Metadata Extraction
- Edition Filtering - First editions are automatically discarded from naming. This may not be appropriate for all use cases. A configurable setting is planned for the next release.
- Metadata Transfer - Carrying over existing PDF metadata to the new file is not yet implemented.
Testing Limitations
-
Test Coverage - Limited test suite focusing mainly on happy paths. Edge cases and error conditions need more coverage. See Test Cases for current scenarios.
-
Real Test Files - Need to create actual PDF files for each test case. For example, books with many ISBN formats and various edition formats. If you have specific documents please submit a discussion.
Setting up a benchmark would be helpful. Feedback and ideas for this are greatly appreciated!
See the Roadmap for planned improvements to these limitations.