Digitizing Historical Newspapers from Microfilm

Metadata Creation for Ellsworth American Microfilm

Maine State Library staff get first peek at “new” microfilm that will be used for digitization.

Several boxes of newly duplicated microfilm recently arrived at the Maine State Library in preparation for the next phase of our newspaper digitization project.  The work is the result of a continuation of funding through the National Digital Newspaper Program and will involve the imaging of an additional 100,000 pages of historical newspapers from the following titles:

  • Ellsworth American (Ellsworth)
  • Kennebec Journal (Augusta)
  • Somerset News and Independent Reporter (Skowhegan)
  • Daily Northern Tribune (Bath)
  • Eastern Times (Bath)
  • Lincoln Telegraph (Bath)
  • Jenks’ Portland Gazette (Portland)

Although the digitization of the film itself is outsourced to a company that specializes in capturing high resolution images from microfilm reels, the bulk of the work on either end of the project falls on our staff to track down suitable film, document information about the content that will be imaged, and conduct quality control checks before the digitized collection is posted online.

We frequently get questions regarding the process, timing and considerations about the film we use for digitization and I thought it might be helpful to share a basic overview of how it all works.

Finding Suitable Film

The process starts with identification of suitable microfilm that we can use for digitization.  While there is plenty of film stored in the holdings of libraries and historical societies throughout Maine, most of that media is third generation film that was created and used as service copies.  Over the years, those service copies inevitably degrade in quality due to the wear and tear that comes from being repeatedly loaded and read on film viewers.

For the purposes of film that we are digitizing through our work with the National Digital Newspaper Program, we need to track down first-generation master film (film that came directly out of the camera that did the original imaging) or the print master film (second generation duplicate of the master that was used to create service copies.)  These versions of the film are more difficult to source.  Sometimes the masters are held by the company that created the film or they belong to the newspaper or library that originally paid to have the film created.  Unfortunately, for many Maine newspapers, we’ve been unable to locate any master film holdings.

If we are able to arrange a loan of the film and decide to proceed with a digitization project, we contract with an imaging company to make a series of duplicates from the borrowed film and those duplicates are what we use for the actual digitization.  When the project is finished, the duplicate reel sets will be kept at the Library of Congress and Maine State Library so that the collection can be accessed in the event that film might need to be re-digitized in the future.

Why Quality Film Matters

In order for the digitization to be most useful, it needs to be text searchable.  That’s why whenever we digitize collections that include printed text, we run the imaged content through specialty software that performs optical character recognition (OCR) to identify the location and sequence of every character included on the page.

OCR is particularly challenging when working with microfilm because it involves working with images that were often created long ago and stored in a miniaturized format.  Even the smallest scratches or imperfections on the film can make it difficult for the human eye to discern text on the blown up image from the film.  These same types of image irregularities also create havoc for the software that is trying to make sense of the character sets and complex column formats that are commonplace in historical newspapers.

Even master film can have its problems.  Images captured from bound newspapers may have content lost in the shadows or curvature of the binding.  Parts of pages may be illegible due to poor lighting conditions, bad equipment or carelessness on the part of the photographer when the images were captured from the original newsprint.

In the event that quality film is not available, there may be an opportunity to digitize directly from original newspapers.  While digitizing from original papers can yield much better images than what we would normally find on microfilm, the process is more labor intensive and is not generally included within the scope of work that is supported by the National Digital Newspaper Program.  That said, we have equipment here to digitize original newspapers and have pulled together some significant digital collections with the help of a few dedicated project volunteers.  I’ll try and make a point to talk more about how we digitize original newspapers in a future update.

The Work Ahead

The picture at the top of this post was captured as a few of us gathered last week to take a close look at the film that we will be working with from the Ellsworth American.  Our plan is to start with this particular collection with the goal of having the earliest issues of the paper online later this year or early next year.

But there’s a lot of work ahead of us before the film will even be shipped off for digitization.  In the coming months, we will be engaged in the labor intensive process of inspecting each image on each reel to capture page level information about the collection and noting any irregularities in the ordering of content, duplicate or missing images, and problems with image quality.

Once the information is captured, the reels will be shipped out in batches to a vendor that will digitize the content and run the images through specially trained OCR software allowing the full text can be searched.

Their final product will come back to us in batches of 10,000 or more images stored on hard drives.  We will take another close look at the files for image quality and begin the process of batching together pages as issue level PDF files so that the content can be searched and stored in the Digital Maine Repository.  Copies of the images will also be sent to the National Digital Newspaper Program where the collection will become searchable in Chronicling America portal.

We are always looking for volunteers to help us with our newspaper digitization initiatives.  If any of this sounds interesting to you, feel free to get in touch and we’d be happy to bring you into the fold.

Add a Comment

Your email address will not be published. Required fields are marked *