Back matter generally has less metadata we need and is designed primarily as a fallback if
the title and at least one author name was not found in the front matter.
There are certain situations where you may want to process back matter differently.
never: Skip back matter entirely, regardless of what’s found in front matter.
Fastest and most cost-effective option. Best for fiction and short stories.
always:
Process back matter regardless of what fields are found in front matter.
Best for documents where back matter contains detailed appendices,
like reports, technical papers, and research papers.
fallback:
Only process back matter if front matter doesn’t have a title, at least one
author name, and all of the fields set to true
in MATTER_CONFIG.back.fields.
Best balance of speed and completeness. Will check back matter only if
key metadata is missing from front matter. This is the recommended setting
for most document collections with mixed formats and structures.
Which metadata fields to look for if not found in front matter,
and back matter processing is enabled with MATTER_CONFIG.back.mode
set to fallback or always.
These settings are ignored if MATTER_CONFIG.back.mode is never.
Author and Title are the minimum required fields for metadata validation
and filename generation and are not configurable. In other words,
the filename must have something to name the document.
This API shape is a work in progress and a bit unclear.
Will refactor to be more intuitive in a future update.
# Never process back matter, regardless of what's found in front matter.# Note that even though ISBN is set to `true`, it won't trigger a# back matter search because back matter processing is disabled.MATTER_CONFIG ={"back":{"mode":"never","max_pages":5,# no effect# no effect"fields":{"publisher": false,"year": false,"edition": false,"isbn": true,"doi": false,"loc": false}}}
Determines how to express page numbering when naming diagnostic files. 1-based makes
it easier to cross-reference page numbers to the source PDF page numbers.
PAGE_NUM_OFFSET =1MATTER_CONFIG ={"front":{"max_pages":8,}}Pages will be numbered 1-8. Front cover is page 1.
Critical setting that controls whether generated metadata and filenames are written to
PDFs in [project-root]/data/ which are not in a
RUNTIME_IGNORE_DIR_NAME.
Only set this to True once you have run the script and confirmed suggestions are
acceptable. Review Evaluation for a full guide.
Back up your files before setting this to True. Many edge cases may cause unexpected
results. See Test Cases for cases we test for,
and Known Issues for those we know about.