Friday, March 02, 2007

Side Streets

Till then I walk the side streets home,
Even when I’m on my own
“Side Streets” – Saint Etienne (Web site ; Wikipedia entry) from the album Tales from Turnpike House (Wikipedia entry)

Turnpike House is a real building in the London area, and the songs on Saint Etienne’s concept album, Tales from Turnpike House, weave flashes of several fictional character’s lives set in flats in the building. Reviews (sample reviews: 1, 2, 3) offer more backstory than I’ll offer here, but let’s just say the reviewers and I concur: Tales is the exception to the rule – this concept album actually works.

And speaking of concepts that work: OCLC Research has been working on a project called WorldCat Identities. Chief Scientist Thom Hickey – my boss – has been the lead on a project to build the infrastructure to automatically generate one HTML page per identity (i.e. an identity being character, person, animal, or organization, etc. referenced in selected fields in a bibliographic record) in WorldCat – about 19 million unique identities at last count. The pages draw from bibliographic data in WorldCat which is used in conjunction with authority file data to provide information about and list works by the identity, reveal related identities, display publishing patterns, and offer whatever other information of interest we can mine and display. And the works listed link to – you guessed it –

Like many projects OCLC Research has undertaken in recent years, WorldCat Identities builds on prior work by OCLC and RLG.. WorldCat identities draws inspiration from RLG’s RedLightGreen and leverages FRBR (work clustering), Audience Level (surmising audience), VIAF (linking common identities in diverse authority files), NameFinder (user-typo-tolerant searching support), Dewey Browser (DDC made visual) and makes use of SRU, a protocol that OCLC has worked with the Library of Congress and others to develop. And, of course, WorldCat – the collective work of thousands of libraries and tens of thousands of librarians – is the key data source.

Thom’s various entries on his blog, Outgoing (see this entry and later posts) and posts by Lorcan and Walt have talked about the project. Tim O'Reilly gave it a nice write up on O’Reilly Radar. And WorldCat Identities has also been mentioned by a number of other blogs (see the end of this post for links to posts I’ve found).

The attention is gratifying and confirms the generally positive reactions and excitement face-to-face demonstrations of iterations of the project have engendered when Thom has presented WorldCat Identities in various settings. We’re also delighted to be working with our RLG Programs colleagues and several RLG partner institutions to get some early, expert feedback on WorldCat Identities.

As a fallen cataloger and recovering reference librarian, I’ve been impressed with WorldCat Identities in many ways. It leverages libraries’ investments in bibliographic and authority data. Each page is just the sort of by-and-about presentation to make undergraduate-doing-a-paper-about-a-person reference transactions go much faster than helping the user assemble some version of the same on their own. And the links to related identities offer a very addicting experience for the curious – the “side streets” are many and often quite interesting. Some nice examples of pages that work well:

But all is by no means perfect. Searching the names listed above in WorldCat Identities returns search results that show variations in how the names have been recorded in bibliographic records – some differences in form of name no doubt reflect different authoritative forms of name adopted by various communities (and VIAF offers the potential means to link multiple authoritative forms efficiently), but more than a few the variations arise from errors in the underlying data, errors that keep apart things that should be put together or alternatively put together different persons and their works as a single identity.

There are also some not-quite-as-expected-by-the-user ordering and ranking of works associated with some identities (see for example Elvis Presley), but it’s not so obvious how to “fix” many of these unexpected results (the criteria applied make logical sense for most pages) – tinkering with ranking often fixes one case only to break many, many more. And, of course, for those music lovers among us, it’s wonderful to find so many persons involved with music, but at this stage in the project, WorldCat Identities does not yet include corporate headings so no musical groups are given their own pages (and yes, you may spot a few, but they’re not really supposed to be there – these reflect a small but visible corpus of MARC tagging errors). Note that corporate identities will be added at some point – it’s a research project, after all, and we didn’t put in every feature we’d planned on day one.

So I invite you, gentle reader, to try WorldCat Identities out and let us know what you think. And if you find some especially compelling side streets, please leave a comment on this post so we can all retrace your steps.

{Posts relating to WorldCat Identities in various blogs (feel free to add more references via the comments – apologies in advance to those I might have missed): English-language: Baby Boomer Librarian, Catalogablog, DigitalKoans, Family Man Librarian, Household Opera, Library Stuff, Notional Slurry, PersonaNonData, Thinking Out Loud, Tom Keags, Vacuum, Wikimetrics ; French-language: Figoblog, ; Italian-language: Fermo 2003 ; Japanese-language: Current Awareness Portal ; Romanian-language: ProLibro. Wikipedia articles incorporating WorldCat Identities links: Bill Clinton, Brad Pitt}

Photo: Doorways in the French Concession area of Shanghai. (c2004 Eric Childress)

No comments: