Found any bugs? Please let us know by submitting a GitHub
issue.
LLM Processing
- Non-deterministic - Each processing run may produce slightly different results due to how LLM’s work.
- Field Stability - While core identifiers (ISBN, DOI, LOC) remain consistent, interpretative fields like author names, subtitles, years, and publisher details can vary between runs. What are the best use cases?
File Operations
- PDF Writing - May produce unexpected metadata changes or file corruption. See Evaluation for testing strategies to avoid unexpected results.
- Filename Length - Limited to 255 characters to maintain compatibility across different operating systems. Currently tested on macOS only. Windows and Linux support is experimental - please report any issues on GitHub.
- Unicode Support - Special characters in filenames can trigger issues on certain operating systems.
- File Validation - Source PDF corruption scanning is not yet implemented.
- Language Support - Primary language support is English. Limited support for non-Latin character sets and right-to-left languages. Arabic, Chinese, Japanese and Korean text may produce inconsistent results.
OCR and Vision Analysis
- Confidence Floor - Text segments with OCR confidence below 30% are automatically discarded. This is a subjective threshold and you’ll need to experiment with your own documents to find the best balance.
- Limited Coverage - Vision analysis is selective, only processing the cover page, high-image-content pages (>90%), early pages with poor OCR, and mixed-content layouts.
Metadata Extraction
- Edition Filtering - First editions are automatically discarded from naming. This may not be appropriate for all use cases. A configurable setting is planned for the next release.
- Metadata Transfer - Carrying over existing PDF metadata to the new file is not yet implemented.
Testing Limitations
- Test Coverage - Limited test suite focusing mainly on happy paths. Edge cases and error conditions need more coverage. See Test Cases for current scenarios.
- Real Test Files - Need to create actual PDF files for each test case. For example, books with many ISBN formats and various edition formats. If you have specific documents please submit a discussion. Setting up a benchmark would be helpful. Feedback and ideas for this are greatly appreciated!