Scanning is becoming easier

9 June 2025 • Persistent link: iarccum.org/?p=5240

Scanning is becoming easier. The earliest scans in this archive were made in 2008 using a flatbed scanner. 18,000 individual JPEGs were later inserted into PDF files for posting. By 2013, when scanning at the Anglican Centre in Rome, I was able to use a photocopier directly to create a PDF, but these scans had only 200 dpi resolution and no colour or lighting correction.

In 2016, I used a DSLR camera in the Lambeth Palace Library to photograph 1,200 pages of documents. These later needed to be cropped and inserted into PDFs; however, the JPEGs were of high quality, and colour correction was a little easier. The quality was sufficient for the new Optical Character Recognition (OCR) software to produce some full-text, but there was significant proofreading required.

In 2019, I spent a weekend at the Anglican Centre in Rome, where I experimented with Adobe Scan on a tablet. Adobe Scan includes automatic lighting and colour correction, deskewing, OCR, and PDF creation. The only problem was that it took much longer to scan each document and to transfer it to my backup disk.

In 2024, I scanned numerous documents at the Anglican Communion Office using a Toshiba photocopier onto a flash drive. I scanned at 600 dpi in black and white, directly to PDF. In many cases, the documents could be fed through the sheet feeder, allowing scans of many double-sided pages quickly. The Toshiba scanning algorithm adjusts lighting better than Adobe or any of the earlier scanning techniques. Toshiba can also include OCR during the same scan, but their OCR is not as advanced as Adobe Acrobat.

I am now loading each PDF in Acrobat Pro, and then using the Compress features to optimise, deskew, OCR, and compress the files. It only takes a few moments, even for long documents. Hopefully, this will make the PDF texts easier to search using online tools such as Google. It does make it easier to copy text from these documents, although proofreading is always necessary, especially for older documents using unfamiliar fonts.