Thursday, September 07, 2006

Digitizing Books & Completeness

Tim O'Reilly at O'Reilly Radar has an interesting guest post from Judith Sutherland of Distributed Proofreaders (a group of which I was unaware). DP is a group of volunteers who do the quality checking and correction for Project Gutenberg (which is a wiki now...cool).

Judith writes:
"Something I've wondered about and haven't seen discussed anywhere is the need (or lack thereof) for quality, mostly in the sense of completeness, in the mass digitization programs. The volunteers at DP report many missing pages in the Google books, particularly near illustrations, as well as that none of the Google books make available color scans of any illustrations."

Considering the high amount of traffic recently on Web4Lib on this very topic I was interested that Tim and Judith, both not of LibraryLand, have not heard from librarians who share their concerns about completeness and quality in the various book digitization projects. What this suggests is that our opinions are not getting out into the larger community of people participating in various ways in such projects.

Why not? Do we talk only amongst ourselves? If this is the case, more of us need to be blogging, writing opinion pieces for newspapers, leaving comments on the blogs of non-LibraryLand people writing in areas of mutual interest, and attending and speaking at conferences outside of the usual ones. Don't we have expertise and ideas to share?

So, how about a little less kvetching in the family, and a little more effort made to being heard outside the compound?

1 comment:

Jill Hurst-Wahl said...

I actually addressed Google's digitzation quality in my blog in Nov. 2005 (here). I also noted in January 2006 that a Business Week article has discussed this.

What is amazing to me is that by and large the large of quality does not seem to be an overwhelming concern in the industry. Perhaps it is because people do see this as an index that will point them to content, then they can find a more suitable version to read. Of course, if pages are missing, the index is incomplete.

I had hoped that we would learn from Google about world-class digitization process, but that is not happening. Instead it is likely that short-term we are learning about things not to do.