Tuesday, September 11, 2012

We have a problem... another vendor appearing to need education about exactly WHO owns library data.

I'm an educator and librarian and as such, have long been a fan of history. I'm also a firm believer in the statement: "Those who don't know history are doomed to repeat it.".  So I probably shouldn't be surprised when a vendor of library software and content products comes along and allegedly makes statements that get posted on social media showing they haven't paid attention to history and now are going to learn what others painfully found out long ago. 

This exchange happened last month on Twitter:
Having been involved with library automation for a long time, I've watched this scene played out so many times I can recite it like a veteran actor in a long-running Broadway play.   This same tactic was tried back in the days of the early ILS systems.  It happened again with bibliographic data and OCLC (now largely resolved).  Now it is apparently going to be repeated on holdings data captured in library knowledgebases.  

The library's position on data they create and enter into products is well known, documented and often backed up by legal statute:  The Library owns their data.  They created it using their resources and funding and they maintain it -- it is theirs.

Vendors, who have learned this chapter of library automation history, most likely state this explicitly within the contract signed with the customer.  Many libraries will insist the clause be included before they'll sign the contract (they should also include, but frequently don't, wording on how fast those data will be provided to them when requested and at what cost).   Obviously, some vendors don't provide that language in their contracts and/or some customers forget to insist it be there, leading to this kind of unhappy and unpleasant exchange.

As the Twitter discussion above shows, this most frequently occurs when a library wants to use their data for a purpose that the vendor/organization holding it views as a threat to their interests.  The vendor thinking here is usually along these lines: 
  1. Try and lock the customer into the vendor's solutions by making the extraction process long and costly (i.e. very painful), or
  2. Claim the data reveal proprietary information, or
  3. Prevent the customer from selecting and using a competing product (as suggested in the thread above), or 
  4. Use the customer data as a bargaining chip (i.e. hold it hostage) in a negotiation with another vendor. 
None of these are acceptable.  The library community, as a group, should stand against this.   

I've written in a previous post about my concern with cloud-based systems leading to increasing vendor silos and lock-in.  This exchange is proof that concern is valid.  Plus, we must realize that placing our data in the cloud means we likely don't even have a copy of those data. This makes the issue all the more alarming.  (Of course, this provides another point to consider for inclusion in your legal agreement for cloud-based services.  Get a periodic copy of your data, delivered to you, in an industry standard format for safe-keeping).    

Let's look at the list above in more detail:

1.  Providing a customer's data back to them is a well-known process.  Any vendor/collaborative who requires more than 30 days to do so and who can't specify a charge for this upfront (including an inflation factor is reasonable of the vendor) is guilty of trying to hold their customer hostage.  Libraries should avoid this by carefully negotiating their contract on the front end to include language covering these points and thus avoid this scenario from happening.

2.  It is equally specious for a vendor/collaborative to claim that providing data reveals proprietary information.   Data that do so can always be transformed on the way out of the system into a neutral and documented industry exchange format.  This is true for all types of records and any claims to the contrary are simply not true.  For example, it is a defacto standard within the library market that the export of electronic holdings data uses the Google Scholar export format (just like we use MARC for bibliographic data).  So this holdings data can be exported in an agreed upon, neutral format.   At the same time, librarians have to realize that library holdings data can be attached to licensed third party data and a vendor/collaborative can't export that without the right to do so.  However, those data can be detached and only the data the vendor/collaborative has the rights to export can be exported.  Referring to the Google Scholar export format again, it should be noted that it was designed with this in mind.  What are equally important for librarians to understand in all of this is exactly what data are licensed data from third parties and what data are theirs so they know what they can expect to see in an export file.  The bottom line here, is that data owned by the library can be exported in some neutral format, so don't accept any claims to the contrary.

3.  It will always be true that vendors/collaboratives can better integrate products and thus provide smoother workflows if they have total control of all the components. However, when that choice becomes mandated by the vendor/collaborative community, libraries lose several key things, such as:  1) An exit strategy.  A library, particularly when using a cloud based service, must ensure they have the ability to exit from one system and move to another system.  2) The ability to integrate the best solutions available in the marketplace. As a consequence, they lose the ability to provide the best service for their members, and 3) Purchasing power in the marketplace.   

Can the loss of the above be justified by better workflows?  It seems unlikely to me.   No one vendor/collaborative is going to offer the best solutions in ALL areas of library activities.  In my opinion, libraries need to maintain the right to have choices and for those choices to include competitive offerings.  This is not something you can negotiate solely in your purchasing agreements it is something you must also mandate through your purchasing behavior.  Libraries should insist on the ability to do the things listed above and should not purchase products/services from vendors/collaboratives that do not support these choices.  

Of course, for systems to work well together, librarians must also participate in defining standards for the use of Application Programming Interfaces (API's).  This is what enables systems integration (see this blog post for more information).  If we fail to do this, we fail to serve our profession as well as we should in doing our jobs as librarians.

4.  Placing libraries in the middle of contract negotiations between vendors and other providers is also inappropriate.  Let's look at one example showing how this can happen.  Before getting into the details of this example, it is important to understand the difference between offline export of our library's data and online access to systems containing our library's data because these are two very different things.   

Offline export of library owned data, i.e. putting it in a file and on media so that can be readily given to the customer, is a standard process and is essential so data can be used by whomever the library designates in order to serve the library's needs. As noted above, this should be done in a reasonable, known timeframe, at a reasonable, known cost (covering the cost of export, media and shipping) and using industry standard and documented formats.  These data are originally and simply entrusted to the vendor or collaborative for the purpose of offering a service to their members. Upon demand it should be returned.   

Online access to systems containing a library's data is typically done through API's and represents a very different scenario.  Yes, the same data might be available from those API's, but API's carry overhead in terms of maintaining them and in making them work together smoothly as well as the cost of the hardware and network running them.  It is also important to realize that it's possible to impose substantial loads on a vendor/collaborative system through usage of the API's.  It's also the case that use of API's is not yet done according to standards and thus some custom programming is frequently required in order to make these things work.  All of these create costs for which compensation is fairly justified and requested by the vendor/collaborative.  It's compensation not for the data, but for the overhead imposed by serving the data through the API's.   

With those two types of access described, here is how all these variables combine in a contract negotiation between vendors/collaboratives.   Using the example we started this section with, t
here are a lot of vendors/collaboratives working to open up their systems through their API's.  Some vendors/collaboratives have done this for years and others are just starting to do so.  However, where the right to  access of a vendor/collaborative system gets very sticky is when it means opening up those API's to organizations viewed by the vendor/collaborative as competitors.  

Of course, libraries want vendors to do this for the very reason I cited above, so they can integrate the best-of-field solutions together.  Vendors, or a collaborative, can see this as giving away a competitive advantage and thus can flatly refuse to do so or will demand the other vendor negotiate a separate agreement for access (usually at a cost). While totally refusing access is unacceptable, it's not unreasonable for a vendor/collaborative to demand fair compensation and or agreement on the rights and terms of usage. Thus negotiations are opened up between the vendors/collaborative. 

So, as might be expected, in some of these negotiations, the vendors can't reach agreement on the usage of the API's.  When that happens and one of them is holding customer data, and then that customer chooses to use a competing product from the other vendor involved in the negotiation, the first vendor sometimes refuses to do an offline export of the customer data.  The customer data becomes held hostage (as shown above in the Twitter exchange).  Consequently, the library is also held hostage.  Yet the library has no say in the negotiation and is prevented from using the product they want to use to deliver services to their users/members.    This is unacceptable and libraries must not stand for it. 

When, for whatever reason, a library wants the data they own back, the vendor/collaborative has a moral (and frequently legal) responsibility to give it back to the library at a reasonable, known cost, in a reasonable, known timeframe and in a documented industry standard format. 


These vendors/collaboratives exist to serve libraries, not the other way around.  Libraries vote with their dollars and purchasing choices and to prevent this kind of behavior, they must utilize their power collaboratively.  When libraries act separately, vendors/collaboratives frequently apply the "whack-the-mole" approach to divide and conquer.  Libraries must band together and through their organizations issue a profession wide policy statement concerning library ownership of data.  This policy statement should then be referenced in both purchasing and legal agreements as a requirement to be met.   

We have a lot of new battles to fight, new challenges to overcome and we simply do not have the time to educate vendors or collaboratives in the marketplace on issues we thought solved long ago.  So we have to pick our battles carefully.  However, this one is important.  We must make it clear that Libraries own their data.  It may be provided to vendors to use to with their products and services, but upon demand, it must be given back to the library at a reasonable, known cost, in a reasonable, known timeframe and in a industry standard, documented format.  Library data must NOT be used as a bargaining chip. 

I encourage librarians to act and to issue a policy statement on this topic and put it to rest once and for all.  This is a history lesson we don't want to repeat again.