Simple paper to optical archive tooling, starts with scanning
each page and ends with three files for the archive:
 - document.pdf: PDF of scan, cleaned up, rotation fixed OCRed in extra layer
 - document.txt: OCRed text from document, for indexer/search engine
 - document.tar: original raw scans in case we need to reprocess later

tools:
 - scanpage: scan a single page to a TIFF file, monochrome, 600 dpi
 - scanpagecolor: scan a single page to a TIFF file, color, 600 dpi
 - scan2page: convert set of scans to the above three files
 - scan2pagecolor: the same for color

Note: the scan2page tooling assumes it can just iterate to linear list
of *.tiff files in directory order and bundle them together. The basename
of the file ends up as the "page number/name" in the PDF.

workflow:
 - ./scanpage 1
 - ./scanpage 2
 - ./scanpage ...
 - ./scan2page (this asks for a basename)

