Tuesday, June 30, 2009

Digital Preservation -- Observations from PASIG

The SUN Preservation & Archiving Special Interest Group (PASIG) meeting, held last week in Malta, was once again a meeting of thought leaders in the technical aspects of digital preservation. Many common trends emerged as I listened to the excellent presentations.

Among the first trends I observed was this: digital preservation is still
perceived as "too hard and (as a result, most feel they) can't do it", said Steve Knight, of the New Zealand National Library. He also noted the continuing "need for a sustainable digital preservation solution." Steve's observations closely match my own experiences and concerns. Too many librarians and archivists are quite simply ignoring the issue and/or using the economic crisis to avoid dealing with the challenges of digital preservation. Yet we face staggering growth in digital data. This also underscored Steve's observation that building long-term, economically sustainable solutions is a critically important issue.

The issues of context and technology crossed paths in conversations
with and presentations by David Rosenthal of Stanford University. He pointed out the difficulties we face in preserving the context of objects, as well as the very information objects themselves, when so many objects are the result of the process of dynamic assembly used in creating mashups. He posed these questions: How do we capture all that information and do so in a way it can be re-used? And, how do we build solutions that will scale to meet these needs? These points also raised questions again of what do we choose to preserve because capturing any one information object contextually can logarithmically expand the total quantity of data to be captured. In addition, David raised the point that we don't yet have the needed "copyright framework for research data". The complexities involved in answering that particular legal issue involve working around people with vested interests and require working directly with research funding authorities on a national level. Good points and questions, each and every one of them.

The need for more best-practices, creating and/or updating standards,
as well as the need for certification of Trusted Digital Repositories (TDRs) also emerged as a common theme. For instance, while OAIS is out for review and has been for quite some time, the status of that update work is very unclear. Yet, in a presentation by the FamilySearch attendees, they showed how they've made extensions to the standard to accommodate scalability issues that they've encountered and that were not addressed in the original design. This indicates to me things that need to be fed into that OAIS revision process. Chris Rusbridge of the University of Edinburgh did an interesting presentation about the need for repository certification and TDR's in order to provide users with the "trust"required to place their digital objects in preservation repositories. This is all too often an issue that has not yet received enough attention, yet Chris is absolutely right that it is very important.

During a panel session, I raised the point that while we were seeing a
lot of thought leadership in the PASIG presentations, I felt what was needed was not just thought leadership, but "active and coordinated" thought leadership -- which it did not feel like we were achieving. I specifically put forth that, given the sheer size of the constantly growing corpus of digital data to be captured, not to mention the system scalability issues incurred by that growth and the necessary business models involved in making it all affordable, we might seriously question if the conference represented the best use of the valuable resources the conference itself entailed?

Too often, throughout the conference, it felt like many people were
busy trying to solve the same problems, often arriving at similar, but yet different solutions that in the end catered to very specific and unique organizational needs. One wonders if we wouldn't be better off to focus on a larger, but more generalized vision. By designing a total overall framework in which developments made by the many conference participants could be plugged together for a more comprehensive solution, far greater progress might be made. Certainly best practices such as policy templates, lists of applicable and needed standards, audit practices, and standard backup processes/procedures would all be good starting points. For instance, we could take the list of identified areas needing standards development and put forth light-weight, quick-to-the-field draft standards that could evolve to become full, accredited standards after we had actual experience in using them. One approach that should definitely be used more is national plans and possibly even funding to drive national digital preservation work. I was particularly impressed at the conference, by the work being done in Slovakia in this area. They have come up with a "National Information Infrastructure" plan that covers digital preservation, and they've created an "Integrated Conservation Centre" to help coordinate libraries digitization initiatives. In a more specific example, Rob Sharpe of Tessella called for the creation of more national registries. All are examples of approaches that should be replicated on a wide scale. It was also very interesting to note the number of presentations that involved video and/or audio-visual objects in parallel, but separate fields. One remark by Richard Wright of BBC Future Media and Technology was particularly telling when he pointed out that they are also trying to solve many of the same issues and coming up with similar ideas. I've never heard a clearer call for librarians and archivists to reach across traditional boundary lines and work arm-in-arm with others to solve some large scale problems.

At the end of the conference, while the content was excellent, I was
left with the concern that, while we're continuing to face huge challenges in digital preservation, we're trying to solve those challenges individually rather than collaboratively. That is an approach we can't afford financially or strategically. While we grapple with the challenges, the black hole that is permanently sucking away digital content that should be preserved is growing. We'll never be able to recover it. The only hope we have of bringing needs into line with capabilities is for us to envision a large-scale plan, seriously evaluate what can be done, and how, who can do it and start parsing out assignments to bring the collective results forward to the profession.