A powerful set of scripts for standardizing large collections of books, papers + other published documents. We use a combination of local + cloud OCR, Vision Language Models (VLM) and Large Language Models (LLM) to extract and intelligently generate metadata + filenames. This project is a work-in-progress, as both a tool and a learning project for AI-led agentic development.Documentation Index
Fetch the complete documentation index at: https://cstreams.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Before
├── Androids Dream of Electric Sheep__English-242L.pdf
├── Quantum Computing Introduction MITPRESS_2011.pdf
├── Complexity ihn Physics- .pdf
├── GOODFELLOW_AVIAN (books about birds).pdf
├── j.physrep.2024.01.012.pdf
└── 10.1007-978-3-031-04083-2.pdfAfter
├── Do Androids Dream of Electric Sheep, (Philip K. Dick), Doubleday, (1968).pdf
├── A Gentle Introduction to Quantum Computing, (Eleanor Rieffel), MIT Press, (2011).pdf
├── More Than the Sum of the Parts, Complexity in Physics and Beyond, (Helmut Satz), Oxford University Press, (2022).pdf
├── Avian Architecture, (Peter Goodfellow), Princeton University Press, 2nd Ed, (2024).pdf
├── Quantum Phase Transitions in Driven Systems, (Smith et al.), Physical Review, (2024).pdf
└── Emergence in Complex Networks, (Lee Johnson), arXiv, (2024).pdfThis project is a work-in-progress:
- Back up your PDFs
- Run the scripts iteratively on a small subset of your collection before scaling up
- Monitor your cloud API costs: Displayed costs are only estimates
Quickstart
Get up and running quickly
Key Concepts
The main ideas behind the project
Use Cases
Is this project right for you?