The Tesseract open-source OCR engine is the first step in our pipeline and extracts the bulk of text from document pages. Since v4, Tesseract uses LSTM (Long Short-Term Memory) neural network, combining traditional OCR techniques with modern neural networks.Documentation Index
Fetch the complete documentation index at: https://cstreams.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Configuration
Scaling factor for image processing before OCR. Higher values increase resolution but
require more processing power.
Minimum confidence threshold percentage for OCR results. Pages with confidence below
this threshold may be processed by the Vision Language Model if enabled.