Key Concepts
Matter Pages
The first and last pages of a document
example matter pages from Machine Learning System Design
What are these pages?
Front and back matter sections refer to the first and last pages of a document.
These sections hold the key raw material we’ll feed to the LLM as grounded context for an accurate filename and metadata prediction.
We skip the body matter, the middle chunk of the document, as it rarely has the metadata we need.
Front Matter Pages
Commonly found pages in the first pages contain publication details and introductory content:
- Cover - Title, author[s], publisher logo
- Half-title - Title and subtitle only
- Recommended - Other/similar books by the same author[s]/publisher
- Title - Primary source for title and subtitle
- Copyright - Publication year, edition, DOI, LOC, ISBN[s]
- Letter from the author[s]
- Acknowledgements
- Preface
- Table of contents - Document structure and scope
Configurable with the MATTER_CONFIG.front.max_pages setting.
Back Matter Pages
Commonly found pages in the last pages, contain supplementary info:
- Bibliography - References and citations
- Glossary - Key terms and definitions
- Appendices - Additional material and data
- Index/End notes - Subject coverage and annotations
- Author[s] bios - Detailed author[s] information
- Back cover - Marketing copy and additional metadata
Configurable with the MATTER_CONFIG.back.max_pages setting.