Monday, March 29, 2010

Discovering the need for discovery solutions that also support meta/federated searching

Lately, I’ve noted in the profession, an increasing trend towards thinking that meta/federated searching is an unnecessary and unimportant tool for assisting users in finding information. But I wonder: Is this the correct conclusion? If not, why?

Meta/federated searching has long been a technology with a complex mission and mixed experiences for users. Librarians and some end-users have voiced concerns about searches being too slow, interfaces that don’t mimic the full functionality of the native interfaces, and issues with relevance ranking. While these may be valid complaints, I have observed that librarians and end-users sometimes have two radically different objectives in using meta/federated search tools, and thus their respective experiences can vary widely.

When it comes to speed, meta/federated searching is slower than searching mega-aggregate indexes. It is frequently said that the fastest meta/federated search is equal to the slowest database being searched. Furthermore, it is understood that Google has come to define search response times and anything over several seconds is seen as being too long. However, compared to a end-user or a librarian taking the time to search those resources separately, using differing interfaces, meta/federated searching is quicker and does stand to offer resources that might not be offered in the mega-aggregate index for reasons we’ll discuss below. The point here, is provided users understand the trade-offs involved, they’ll understand that meta/federated search is a time-saver.

If we look at librarians using meta/federated searching, we frequently find that they want everything in the native interfaces replicated in the meta/federated search interface. On one level this is understandable, as librarians can and do utilize their deep familiarity with the resources they license for their users to know where to search. Once there, they know how to use these resources quickly and effectively and so they expect the meta/federated search interface to offer equivalent functionality. If they don’t find this, they view the search tool as a hindrance, an impediment to efficiently finding the resource needed.

Most end-users, on the other hand, simply want to enter a search query so the search tool will help them discover the resources that will meet their needs. (Remember Roy Tenant’s famous dictum: “Librarians like to search, users like to find”).

I’ve always maintained that meta/federated search tools are just that – discovery tools. To view them in any other way, is to set expectations that are unrealistic and ultimately doom the tool to a dismal reception by end-users.

Now, of course, we’ve seen the term “discovery solutions” come to mean a whole new set of wonderful, powerful search products. These are tools, based on the model of Google, that harvest content into a mega-aggregate of content that can be searched with rich functionality and incredible speed and ultimately allow libraries to deliver rich resources and functionality direct to their end-users. But do these new tools obviate the need for meta/federated searching tools?

My answer would be that most libraries likely need both of these solutions because they ultimately meet different end-user needs. Both are discovery tools, but they meet the needs of end-users in different ways and deliver different capabilities.

For example, an undergraduate needing to assemble a paper quickly might well benefit from a search of a mega-aggregate index that quickly produces several results that can be used interchangeably. However, the student or researcher conducting deep research into a subject will likely want to know not only everything available from known resources, but also from unknown resources. Then the need for meta/federated searching becomes more important because it will very likely broaden the content they can find. Understanding these two divergent set of needs require the library to offer different tools within the common discovery interface.

Naysayers will point to the fact that not all databases/titles are available through meta/federated search tools either.

And they would be correct. And the reasons are numerous.

Some content vendors may lack funds to upgrade their legacy platforms to allow harvesting. Yet, their content contains golden research that will remain in the dark and unused without meta/federated searching. Other content vendors believe their interface offers so much value that separating it from the content diminishes the value of their content. When this is the case, they’ll hinder or deny access to their content via meta/federated search tools.

Meta/federated search tools enable libraries to expand access to include more of the library resources as well as other types of resources. For instance, they’ll help address the “long tail” of resources and making them available to end-users. As described in the concept of the “long tail” not all resources are in high enough demand to justify their inclusion in a resource designed to address the masses (the mega-aggregate index in this case), but that doesn’t make them any less important to end-users who would value their content. Finally, we must remember that we’re in a time of rapidly growing number of resources composed of radically different data types. Meta/federated search are likely to greatly increase the probability that libraries will be able to search these resources as well. All of this taken together will help researchers discover for themselves that “serendipity” experience of finding results where they did not expect and providing them with greater value as a result of using the library discovery tools.

As a librarian, a question that you must ask yourself is: If you thought those database/titles were important enough to subscribe to in the first place, why would you be willing to give up the ability to search them through the newer discovery interface that only searches a mega-aggregate index? Of course, I suspect you wouldn’t. And when you’re looking at discovery solution, most vendors, including Ex Libris, will help you analyze the list of titles you’re subscribing to in your library and see how many of them will be searchable via the discovery solution. Coverage varies of course, depending on your lsubscriptions, but if you have highly specialized collections, or otherwise fit into the areas described above, the coverage via the mega-aggregate index may not be as high as you expect. If so, you clearly are more likely to be in need of the capabilities offered by meta/federated search solutions.

Now, if you’re in this class and you ask a discovery solution vendor if its tool supports meta/federated search under that same interface, that vendor might well try to divert you by saying, “You don’t need it”. Be careful here as that might be because the discovery vendor you’re talking with does not have the ability to integrate a meta/federated search within its discovery interface (Note: this is not the case with Ex Libris). If you need the capability, then make sure you don’t let the vendor’s limited capabilities within its discovery solution limit your end-users discovery capabilities.

The important point is meta/federated search is a way to expand access to the library’s licensed resources, and, importantly, as enabled by the library – not by the content provider. Meta/federated search tools enhance the searchers’ experience by exposing them to substantially more of the library’s resources, not just those offered by a specific vendor.

I see this trend manifesting itself in other ways within the new discovery solutions. I’m also concerned about libraries relinquishing control when it comes to relevancy rankings of the search results. Long a place of difficulty in the meta/federated searching tools, we’re seeing a new version of this problem with the new discovery tools on mega-aggregate indexes. Librarians must be able to manage the relevancy ranking of results. Again, this is part of the real value of librarianship. Yet, I’m seeing some solutions being offered to libraries in the market that restrict this capability to the vendor.

This is a mistake for librarians. If they surrender control of the software to this extent to the vendor, it devalues the profession of librarianship. For the library, having this capability means the ability to ensure that local content can be boosted in the result set to ensure that local subject expertise is further spotlighted. More importantly, if you buy a meta/federated search product from a vendor owned by a content vendor, and you don’t have control over the relevancy ranking, you have to wonder: Is content from that content provider being boosted over that of your institution and/or other providers in order to promote utilization of their own content? It’s a potential quagmire with nasty implications for librarians who aren’t careful. Content neutrality on the part of the discovery interface solution provider has real value to libraries as it ensures a healthy competitive environment for the purchase of content and it allows libraries to add value to the discovery solution.

Clearly, to my way of thinking, most librarians will need both a discovery tool that works with mega-aggregate content and a meta/federated tool so that they are offering a total discovery experience. That’s part of the reason users come to libraries, part of the value-add they should see distinguishing library services and we should not overlook those facts just for the sake of convenience on our part. The tools should work together smoothly and within the same search interface framework to make it as easy as possible for the end-users to utilize both tools. It is a matter of great import that, as we implement technological solutions, we be sure that the value of librarianship is not only preserved, but enriched by the technology we use. Without that need being met, librarians truly put their future at risk. It’s time for librarians to discover that discovery solutions are more than just searching a mega-aggregate index. It means offering a total discovery experience that is best served by seeing the discovery tool also as a content neutral framework under which other tools such as meta/federated searching, recommenders, and link resolvers can all come together to promote the value of librarianship by offering end-users a rich and rewarding discovery-to-delivery information experience unlike any other they can find on the web.

All of this is summarized in a theme I keep echoing in my talks, articles and blog posts – librarians must buy technological solutions that allow them to add value to the solution being offered to their end-users. When librarians give up the ability to have a substantial determination over the content offered to their end-users, whether it be because the aggregate is selected solely by their vendor, or because they can’t add meta/federated search to the solution, it is giving up their right to add value to the solution.

Friday, March 19, 2010

The semantic web and linked data in libraries

In this month’s Library Gang 2.0 the discussion was about the Semantic Web and linked data. It was certainly an interesting discussion involving Karen Coyle, Marshall Breeding from Vanderbilt, Richard Wallis of TALIS and myself. Karen started out describing the world of linked data and how she saw the Semantic Web concepts advancing the goals of linked data. (Note: Karen has recently written a Library Technology Report on this subject, which was the basis for this entire discussion). In that document, Karen quite rightly points out that librarians need to “transform our data so that it can become part of the dominant information environment on the web” so that “the library catalog can move from being ‘on the Web’ to being ‘of the Web’”

I pointed out that semantic web shows the future possibilities of the Web but that data on a world-wide basis realistically can’t be hammered into the model of the semantic web. Furthermore, the vision of the semantic web requires broad agreement across a massive number of user communities and that would be very difficult to achieve. The semantic web is also dependent upon ontologies and inference capabilities that are simply beyond our ability to define and put in place today. So the semantic web, on a broad basis, falls apart. Finally, by the time the vision of Semantic Web is actually implemented, technology will have changed so fast and so much that what we’ll actually see created will be very different than what is envisioned today. As Karen points out, the Semantic Web is currently the “flavor of the month” as far as technology goes and I couldn’t agree more.

Karen’s report and this Library Gang 2.0 podcast are an interesting exploration of the applications of the principles of the Semantic Web and how libraries can benefit from the concepts at foundational levels today. I encourage you to read Karen's report and listen to the podcast.

Monday, March 8, 2010

Breaking down language barriers on the web

The current issue of “The Economist” (March 6-12, 2010) contains an article entitled “The many voices of the web” that is quite an interesting read. It is about translating the content of the web. I still see many libraries struggling with understanding the basic idea that they even need to provide systems that support multi-lingual content. They all too often feel it isn’t a critical issue for their community of users. But this article points out that while “the web connects over a billion people but it is fragmented by language”.

In the United States for example, surveys show that roughly three-quarters of the population speak only English. This article points out; there is rapid growth in Web content that exists only in other languages ranging from Japanese, Chinese, Arabic as well as many others. The ability to access that information is going to become more and more important to research, to critical thinking and to forming a fully rounded understanding of the complex world in which we live.

The article goes on to describes efforts underway at both providing quick manual translations of web content as well as new software being developed to do automated translation. Libraries, as both the keepers of the human record and the portal that provides access to that record must keep a close eye on this technology and examine how it can utilized in fulfilling their mission. It is no longer enough to think solely of providing access in a user’s native language, it is also becoming an incumbent part of the library’s responsibility to break down the fragmentation that exists between silos of language specific content by providing dynamic access to translated versions of that digital content. To do that, we must once again think of the scalability of the task before us and realize that we must embrace technology as an essential, although not the sole, part of the answer to this need.

I recommend reading the article. It reminds you of some of the more interesting challenges we face in the days/years ahead.