A Possible (and amateur) Model for Library Catalogs

I got the chance to attend the IFLA 2016 WLIC in Columbus, Ohio (my home city) and attended a session concerning metadata standards for rare books librarianship. Over the course of the discussions one gentleman discussed the idea of moving away from traditional notions of catalog records and to a more universal digital surrogate record. Unfortunately I cannot recall his name or institutional affiliation, but his comments were what sparked the outlined idea below that has been marinating in my brain ever since.

First of all I’d like to give the disclaimer that I am not an institutional repository librarian, a scholarly publishing librarian, a metadata librarian, or possess the technical knowledge to implement the model outlined here, but instead simply a library support staff member who enjoys reading about these topics.

In the race to implement a variety of digital services, the library services landscape has ended up as a fairly fractious environment with institutional repositories, electronic journals, MARC surrogate records, digital finding aids, and digital libraries often cropping up as individual projects with wildly varying scope, longevity, and update schedules. I am mostly thinking of Ohio State University Libraries when conceptualizing this, as it is where I work. Our new director (Damon Jaggars) has discussed our outdated digital infrastructure, so it could be that other academic libraries have a more robust and unified system in place on their websites, but I feel the ideas could be applicable nevertheless.

At present OSU uses WorldCat Local for our catalog search engine, a situation that has caused no small amount of frustration both by OSUL staff and faculty, and by our patrons. What Damon would like to implement instead is a Hydra software implementation that allows searching of multiple Solr indexes simultaneously. The idea being that you enter a query into a single search box and it carries out the search across all of the Solr indexes simultaneously. Examples of what can be indexed in this fashion are all of the normal digital services one would imagine an academic library providing: the library catalog, the institutional repository, the library website, licensed databases, etc. An example of such a system can be seen at Damon’s prior appointment, Columbia University Libraries. I see such a system as stepping-stone to a truly linked data and unified environment.

We’ve been hearing for ages that people prefer the utterly simple, single box that Google utilizes as being preferred by library users, and I think the Hydra/Solr implementation gets us closer to that, but doesn’t close the gap. While seeing a highly stripped down search interface as a goal in and of itself is probably a mistake, I think there is certainly progress we can make to simplifying our search interfaces. One of the problems that the Hydra/Solr system is trying to surmount is the silo-ing off of our various services. For instance, if you want to find something in the Knowledge Bank at OSU, then you have to get into that individual service and use that search interface, it can’t be accomplished straight from the homepage of the library. Having that indexed and searchable as part of the Hydra implementation certainly makes those items more available, but patrons may still be confused as to what exactly they are seeing, not understanding what the Knowledge Bank is to begin with, and what the context is for those results.

What I’m envisioning instead is a “digital object record” (DOR), capable of being searched in a “unified search engine” (USE) that acts as the fundamental record for all things listed in the library database. DORs need to be flexible, relational, and shareable. First the need for flexibility. A DOR can be any discrete digital object that a library patron is likely to run across by using the library’s services. So a DOR can be a catalog surrogate record for a monograph housed in the stacks. It can be a blog post on the library’s website. It can be a finding aid developed by the Special Collections Cataloging department. It can be an author (essentially taking on the role of an authority record). It can be a collection.

This need for flexibility then touches on the need for interconnected, relational records. A DOR for an electronic thesis housed in the institutional repository would be connected to the author’s DOR, as well as collection DORs for all the ETD’s published in that specific year, all the ETD’s belonging to a specific discipline, and all the ETD’s published by doctoral candidates. These should be implemented in deeply practical ways. Let’s say the author of the aforementioned thesis became a faculty member at the same institution. That DOR could list the author’s email, university webpage, even twitter handle or personal website as well. It could even go on to list what courses were being taught by the faculty member during the current semester, and their office location and hours. Dublin Core could be an excellent model for the usage of relational metadata terms.

Another important aspect of the need for flexibility concerns the DOR’s metadata. The core required fields for the metadata should be minimal, only a title field and an object type field. The semantics here are troublesome, because a DOR for an author with a title listing is inherently confusing since it’s a misuse of the word “title.” We’re instead going to go with the term “primary descriptor” (PD). The object type would be similar to how the general material designations work now in AACR2, a means for easy identification of what type of item it is you are looking at. These should both be prominent for easy identification of what exactly it is the patron is looking at to disambiguate between the record for the author itself and one of the authors books for instance.

The DOR’s should be easily and heavily hyperlinked to enable interconnection, and we already see this sort of thing happening in our current library catalog records. When searching for ebooks, the records right now kick you out to the service where access is actually granted, generally not the library itself. The DOR would do the same for the records it was indexing, kicking the user out to the institutional repository, finding aid, blog post, etc, that the patron is seeking to access.

Past these two very basic metadata elements, the rest of the record should be entirely modular based on the metadata dictionary developed for the ingest of the various types of objects being indexed. Different fields will be necessary for a surrogate catalog record than will be needed for an image in the institutional repository. Ingest of new records into the index would be automated presumably, so crosswalking the metadata developed for the original item into the metadata standard being used by this hypothetical USE would be necessary. While again I can’t claim to be a metadata librarian, it is my understanding that METS could for instance be an excellent candidate, for its portability of data and modularity. At OSU, the Knowledge Bank describes all their items in Dublin Core, so the automated process that would ingest those items into the USE would automatically crosswalk those DC fields to METS and populate them into the DOR. Based on my understanding of METS and metadata in general, I personally think it makes a lot more sense to utilize pure METS, and not extend it with fields from other metadata dictionaries to accomplish the final goal of having shareable records. Doing so makes the records more personal to the institution creating them, but makes them far more difficult to easily share with other repositories. The USE itself could be a repository capable of being easily harvested using existing standards such as OAI-PMH as well as heavy use of linked data. The library facilitating the creation of entries in linked data services for researchers could be an additional library service as well.

I think I can be fairly confident that I’m not the first one to come up with such a formation of representing library resources online, but I thought it would be interesting and enjoyable to work through the concept here.