Thursday, November 9, 2017

"Living under the API." Some things librarians need to consider.

At OU Libraries, we've been investigating interesting new tech products and when requested we provide input into the development of some these products.  There truly are some new tech products in the pipes that are going to offer libraries a real chance to provide new value-add capabilities to our communities of users. But. These new products are also raising some real concerns. Those concerns will require us think very carefully as we develop, implement and use them, and not just us at OU Libraries, but all librarians. Let's look at just a couple of those:

  1. First, as is happening all around us, we’re increasingly seeing data and algorithms interwoven into academic scholarship and librarianship.  Clearly, machine processing is going to provide us with a whole new dimension of analysis and research results, but as with any new technology it is also providing new set of challenges.  Two recent moments underscored this point for me.  The first was when we had a guest speaker, Dave King of Exaptive, give a talk at our recent Research Bazaar.  During that talk, Dave stated: “In the past, programers wrote code to implement management decisions.  Now programmers are writing code that makes decisions and managers are trusting it because they don’t understand what is happening in that code.  This is dangerous.”  Dave's company writes such code, so it's really important to listen to what he is saying about how it gets developed.  The second moment, which underscored what Dave pointed out, was when I was reading a new book with the title: "Radical Technologies" and it coined a phrase that describes a lot of what is happening with technology and that phrase was to point out that people are increasingly: "living under an API". API's are Application Programming Interfaces that can embody algorithms that are controlling our lives.  It's happening in social media, search engines, even in your local Target and Walmart stores.  Yes, it can produce great value.  It can also be, as Dave said, very dangerous and increase problematic societal issues that we're already wrestling with full-time.  Another recent book, "Machine Platform Crowd" stated something equally important for us to remember in this discussion: “Technology is a tool. That is true whether it’s a hammer or a deep neural network. Tools don’t decide what happens to people. We decide.” So, when working with these new technologies for instance, we at OU Libraries spend a great deal of time thinking about issues surrounding the support/development of critical thinking skills and reproducibility of research results.  We're concerned about the need for:
    • An understanding, by all parties involved, that software used in analyzing and processing data must be open source or extremely well documented, such that it can be clearly understood what is happening within the processing sequences of that code.  Now, clearly proprietary vendors are not going to make their entire products open source, in order to address this issue, that's well understood.  They need to recover costs and make a profit to sustain and grow their companies.  But the code bits that do actual analysis/processing do need to be either open source or openly and accurately documented.  In part, peer review of code and code logic would be one way to allow us to ensure the integrity of the work as well as ensuring that biases are not written into the code.  This is really important because if we don't watch for this, biases will be propagated over generations of research as results from one research project are used as the basis for subsequent research projects. As part of peer reviewing of code as a scholarly product, it would also need to be: i) Documented, ii) Reusable by any scholar trying to replicate results, iii) Citable, and, finally, iv) Versioned (to ensure accurate reproducibility, including workflows).  
    • The same set of issues occur with the data the code operates upon, be it numerical data, visualizations, citations or the full-text of publications.  When using data, we need to ensure the quality and the openness of the data for the ability of others to verify and reproduce research findings.  
    • These items are really not options, they can't be for us to ensure the integrity of the research done at our universities. 
  2. A major secondary issue we're seeing is that many of these new tools are closely coupled to the content, which is also supplied by the vendor.  Now, as many of you know, I've been on the vendor side of the discussion table in my previous lives and I fully understand their desire is to provide value-add as a way to increase the sale of their content.  Understood.  But it's a sales model from the past, not the future.  Analysis tools need to be decoupled from the content.  Our thinking, as librarians, is focused on the value-add of the tools on top of content, and, thus the need to use that tool across all content, not just that in one suite of offerings.  For us, not doing this poses real challenges in training community members and in the overall ease-of-use of library resources.  (We really don’t want to be in a position of saying: “Yes, we know you can do that with these databases, but you can NOT do it with those databases…”)  We believe what is required is a shift in the understanding on the part of content providers that content is increasingly becoming a commodity, and the value (and thus future sales) is coming in the differentiation of the tools provided on top of content.  The fact that vendors are producing these types of tools already shows an understanding of this, but the fact that they continue to couple the tools with only their content shows a bifurcation of thinking that we fear is not healthy for all concerned.
In order for these new technologies to be successful for both the organizations producing them and the profession of librarianship, these are issues that really need to be addressed head-on, all around the table, whether the tool is open source or proprietary.  As Librarians, we need to do our homework and our due diligence to ensure we understand, in detail, the topics involved.  We also need to insist that the technology allow us to continue to support our core values as well as those of the communities we support.


###