Monday, January 14, 2013

GOKb; Is this an idea that will go?

Knowledge bases are a source of much frustration for today’s librarians and, if reality is known, the vendors and organizations that assemble, market and sell them.  It is a very complex field and business.   The GOKb project is one open effort coming out of the library side that is trying to address some of those frustrations.  

At the past Charleston Conference, I was invited to attend a presentation on GOKb, the Global Open Knowledgebase.  Here is a good slide presentation about the offering which, if you haven’t heard of it or don’t know what is about, is a good introduction.  If you don’t have time to go through that slide show, this post from the Kuali site, also provides a quick overview:
“GOKb is not a link resolver knowledge base; it is focused on global-level metadata about e-resources with the goal of supporting management of those e-resources across the resource lifecycle. GOKb does not aspire to replace current vendor-provided KB products. But it does aspire to make good data available to everybody, including existing KBs, and to provide an open and low-barrier way for libraries to access this data. Our goal is that GOKb data permeates the KB ecosystem so that all library systems, whether ILS, ERM, KB or discovery, will have better quality data about electronic collections than they do today."
Now, I want to say right up front that I’m impressed with the creativity and intelligence behind the design.  The people behind this project clearly understand the problems they want to solve and what they’d like a solution to do.  

As I told the organizers after the Charleston session, in an ideal world, I think they’d be well on their way to success.  Unfortunately, as we all know, we don’t live in an ideal world.  So these are some of the issues I think need to be overcome for this idea to be viable and sustainable: 
  1. Focus.  In all honesty, what I heard and see is a wide variety of issues some of which I think might be better solved, or at least understood, if the developers approached the vendors for an open exchange of concerns and ideas and jointly looking for solutions. GOKb appears to be taking on a very broadly scoped problem for which the solution only offers partial control for the foreseeable future.  That alone poses significant hurdles on the road to success. 
  2. Quality control.  Based on my experience, This is an area where I think GOKb  is vastly underestimating both the need and the importance (although I think after hearing this at the Charleston Conference, they may be doing more here).  While I realize the community behind this idea is trying to use the open source software model in doing the GOKb knowledge base,  however, coupling that with my knowledge of what it takes to maintain a proprietary knowledge base, I know the level of expertise and knowledge, the relationships that need to be maintained with the publishing community, the details that must be examined and massaged in this kind of data – I’m just highly skeptical that GOKb will be able to build, and equally important sustain, that kind of effort using community approaches.  In part that fear is based on the size of the community, which as I outline below, could be smaller than expected.  It will take a really large community effort to achieve the quality needed.  Furthermore, even if the community size did seem assured, there will be issues that will require someone (like a committer in the OSS model) that will take on the responsibility of deciding what is right and what is going into the GOKb. Here are a couple of specific instances that are cause for concern:
    • Title changes (title histories as well as platform changes) are a frequent issue. How will these be dealt with and, at the same time, assure the quality of the data?
    • Vendor files may follow the KBART recommendations and they may be downloadable in a standard way, every week or month.  However, that does not, in and of itself, assure the quality of that data. Experience has often showed it varies quite a lot and thus commercial Kb providers use thousands of rules in their scripts to massage the data, check it, change it … you name it. And the vendors still encounter problems after releasing it.
  3. Too many “if” statements.  There are a lot of good, solid ideas behind this project, but what “if “ those statements don’t turn out the way desired?  Are there alternatives in the wings?  For instance:
    • OLE/GOKb dependency.  What I heard was that this is being coupled with OLE and that it is the use by the OLE members that will really drive the expansion of the GOKb.  However, even by OLE’s admission adoption and production usage won’t happen till 2014, two years from now.  A lot can change in that time frame.  The GOKb people, when I pointed this out, told me: “OLE sites will have to load title lists to support their operations, linking to licenses and orders, etc. Those title lists will support GOKb. OLE sites will also have to maintain those lists because it’s what they pay for and track usage for. So even if that data isn’t driving discovery, it’s being maintained at the level needed for management, which is what we’re trying to do. Extending it beyond the OLE libraries is a roadmap goal, but not required for GOKb to serve its primary purpose as being a management KB for OLE.”  But as I’ve noted in this post in my blog, OLE is still primarily a large academic libraries project and there is no assurance, at this point, that it will find a receptive market beyond those types of libraries.  If it does not, that will limit the size of the base of users and will cause the workload of maintaining this data to rest on a much smaller number of institutions.  This will raise each participants need to commit resources.  Will that be possible?  Who knows at this point?
    • Another silo?  It appears to me that what is being built here is yet another Kb silo to be maintained and interfaced with existing Kb silos.  In order for that to be at all manageable, it will need to be an automated interface, presumably through API’s so that an exchange of data can be easily performed.  My concern here is that we have neither the needed interface specified, nor the standards in place, required to make this a truly scalable and easily implemented process.   Those steps will require a great deal of time as vendors will need to get the specs nailed down and then factor them into their development schedules.  Furthermore, even if we have the API’s in place and the data formats, cross-referencing these databases is often very problematic.  It may not be unreasonable to expect to match the big resources in the KB, but it is wildly optimistic to think the majority will match as intended, and this will result in a lot of work and effort to sort these out on a continuing basis. The GOKb organizers tell me that they are seeking to find willing partners on the supplier side to exchange data using existing standards.  As they point out, vendors in developing applications that can accept ONIX data can no longer say there are no systems that consume that data.  They also tell me that the  JISC partners have some evidence that all the players have expressed some interest in this standard and have done some work with it.  So there is some cause for hope here although I’m worried the timeline for all this to happen is much longer than most librarians realize. 

There is a lot good effort and money being put into the GOKb project and ideas.  There are clearly issues surrounding GOKb that need resolution. Without those, GOKb might end up being yet another silo of data to be maintained and one without a clear pathway to the broader adoption and support that will sustain it.  As I’ve noted in many blog posts and in my many talks, librarianship needs more of these types of collaborative efforts and this one incorporates many excellent ideas.  

I urge librarians to pay close attention to GOKb and to contribute and participate in any way they can to make it a viable and sustainable idea.  Clearly the time to do that is right now.

NOTE: After posting this, a reader reminded me (and I apologize for not including it in the original post) that JISC has also been making some efforts on this front with their KB+ project.  Data to be included in that knowledge base includes: a) Publication Data for all NESLi2, SHEDL and WHEEL agreements, all freely available under a CC0 license, b) Subscription Information and c) License information.  The GOKb people had previously mentioned to me that they were in touch with JISC about KB+ and sharing ideas.  Another reader has told me that there is actually a great deal of cross work happening between the two groups including the sharing of resources and joint meetings (with the next one scheduled for late January 2013).  So, hopefully this will have a good result for both projects.