News

Giving Tuesday: Digitizing Philatelic Literature

12/1/2020

Donate to the Adopt-A-Book on Giving Tuesday!

One of the most frequently asked questions we receive in the library about the digital collections database, APRL Digital, is how exactly the library digitizes philatelic literature from initial paper copy to fully digital and searchable documents that then appear in the database. The step by step process is far more complex than can be detailed here, but what can be explained are the various stages of the process from scanning to image assessment and quality control to approval and finally previewing and saving files for permanent upload.

When beginning the APRL Digital project back in 2017 the software platform chosen for the project was OCLC’s ContentDM. ContentDM is a robust and versatile digital collection management software used by many libraries and institutions among them the Library of Congress, Smithsonian Libraries, the USDA Library as well as many of the nation’s leading university and colleges. The platform is specially adapted to handle large and varied collections that include books, journals, photos, videos and other data formats. ContentDM will enable the APRL to continually grow the library’s digital collection in terms of both size and scope.

book scanning

The first stage of digitizing philatelic literature and making it available through APRL Digital, as many of you know already, is manually scanning the document itself. We are fortunate here at the APRL that we have (through the Mighty Buck program) three book-edge scanners (two in the public space and one for staff use only) and one overhead scanner, all of which enable us to scan a wide variety of materials. The protocols we use for the APRL Digital project includes scanning all pages with text only at 300 dpi in color and pages with text and images at 600 dpi in color. For image format we scan the pages as TIFF files as this file type provides for the greatest image depth and text clarity while maintaining a workable file size. If a higher resolution is needed for a particular image on a page we can increase the resolution depth up to 1200 dpi if necessary.

Once we have the scanned files, all of which follow a specific file naming schema, we then go through the images for quality control purposes. Images are checked for clarity/sharpness, orientation (the scanning software does straighten most images but on occasion orients a page improperly), any missing pages (either during scanning or in the original document itself), color correction (i.e. too dark or too light or improper hue) as well as any page/image resizing that is recommended.

proj cl-1

Once the page images are examined we then load the scans into ContentDM’s Project Client software in batches where a number of different processes take place. The most relevant processes that happen at this stage are that the pages are OCR’d (Optical Character Recognition) in order to be full-text searchable; they are organized together as issues in the case of journals and chapters in the case of books; they are indexed through OCR and finally, metadata (bibliographic data for books and journals as well as volume, issue, page numbers in the case of a journal) is created for each individual file batch (i.e. journal issue or book).

This stage of the process often takes several hours to complete depending on the batch size. Often we run this process continually in the background while doing other library work during the day and then continue the process overnight with computers left on when the work day is done. Before permanently saving these batches they must be approved through the Project Client. Once approved the files can then be previewed and in turn permanently saved for upload into the database. For an average journal issue of 50-70 pages the process from initial scanning to final upload can take from 3-4 hours, sometimes less, sometimes more.

books money

This Giving Tuesday as you donate $50 or more to the Adopt-A-Book campaign your contribution directly assists us in digitizing philatelic literature as described above. The process outlined is costly. The ContentDM software used has an annual fee (for upgrades and maintenance of the database) and there are also server and storage costs. The scanning, quality control and upload of documents all come with their own costs whether being done by volunteers, digital interns or by library staff. This Giving Tuesday is about jumpstarting this process and continuing to maintain and grow the digital collections database for the future.

Double Your Impact through the Generosity of our Matching Donors!

On December 1st only, your impact will be doubled thanks to the generosity of our three matching donors. These three generous donors have pledged to match up to $7,000 of your gifts, dollar for dollar. That means your $50 instantly becomes $100 and will go a long way to helping us grow the size and scope of the APRL digital collections database providing access to the APRL worldwide.