Moves to allow the digitization of ‘orphan works’ and free up the metadata around 20 million cultural objects will benefit the public and could inspire a new wave of apps and web services. But the underlying motivation is fundamentally political.
DuraSpace, a not-for-profit consortium of universities, libraries and museums, just launched a SaaS solution to simplify the preservation of digitized cultural objects in the cloud. Using public cloud resources from both Amazon and Rackspace, DuraCloud hides the complexity of ensuring that valuable resources are preserved for the future. This launch of a subscription service takes the DuraSpace organization in new directions — and possibly to the private cloud.
University and national libraries, as well as museums and archives, have been digitizing their collections since the earliest days of the web. This work both increases access to rare and delicate material and serves to preserve something for future generations if disaster should befall the original work. Digital copies of cultural artifacts — and the metadata used to describe them — have typically been stored in digital repositories such as DuraSpace’s DSpace and Fedora, or the UK’s ePrints. For richly funded institutions such as MIT, Columbia or Cambridge, these systems have worked well. But in smaller institutions, projects have been more likely to use software like Microsoft’s Access database. Although less suitable for the task, these simple databases have been easier to use than the free but complex repository systems offered by DuraSpace and others. DuraCloud promises to take capabilities previously reserved for the rich and well-staffed institutions and make them available in a web browser to anyone.
Packaged as a hosted service that removes the need to configure hardware or patch software, DuraCloud initially appears expensive, costing $375 per month. This includes an Amazon or Rackspace virtual machine (worth about $70) and 500 GB of storage (worth $60–$70), as well as support and updates. Additional storage is billed at your chosen cloud provider’s list price and is added to the DuraCloud invoice. DuraSpace CEO Michele Kimpton sees this as one way that DuraCloud delivers real value to subscribers. Purchasing rules in many libraries, for example, prevent the use of credit cards. Invoices from DuraCloud are far easier for libraries to deal with, as they fit entrenched processes based on purchase orders, approvals and invoices in ways that a traditional SaaS application’s use of credit cards or PayPal does not.
And let’s not forget redundancy, a key principle of digital archiving: The more copies of a document, the less chance there is of losing something forever. However, many institutions struggle to achieve this in a cost-effective manner. DuraCloud’s management interface offers a solution to the problem by letting institutions redundantly store data in multiple Amazon regions or replicate across both Amazon and Rackspace. A sync service ensures that copies remain identical and notifies administrators if data loss occurs. Copies held in other regions could replace lost data.
As well as supporting Amazon and Rackspace, DuraCloud will soon add Microsoft’s Windows Azure. There is also an adapter for Eucalyptus, and Kimpton says she is “looking for a partner” interested in running a Eucalyptus-powered option. She is also interested in OpenStack and tracks Rackspace’s transition to OpenStack code. A UK project exploring the feasibility of running centralized cloud infrastructure for universities might be one place in which an OpenStack-based DuraCloud installation could be tested. As a dedicated academic private cloud, it should drive costs lower than the DuraCloud service itself can, by hosting its own version of the (open-source) DuraCloud software on virtual machines and storage that can then be rented to partners at lower rates than commercial cloud services can match.
As private academic clouds like the OpenStack-powered one at the San Diego Supercomputer Center (SDSC) begin to appear, DuraCloud would be wise to evaluate the cost of basing future services on a network of similar installations at big cultural and academic institutions, rather than depend on the more commercial public cloud services. Shared — but private — cloud infrastructure running in a small number of larger cultural institutions might be capable of reaching sufficient scale to cost-effectively compete with the public infrastructure upon which DuraCloud relies today. Given the scale of the cultural sector and its long-term perspective on preservation, could this be a case in which the private cloud proves better than the public?
Question of the week
Brewster Kahle spoke in the same session as me at an event near Amsterdam today. In amongst a number of interesting points, Kahle mentioned in passing that the Internet Archive offers cloud storage services to public institutions, with contributions of around $2,000 ‘endowing’ a terabyte of storage in perpetuity. This is an intriguing model, and cautious archives, libraries and museums may be happier to trust Kahle than strictly commercial providers. Does this relatively safe and straightforward first step make it easier or harder for those institutions to subsequently adopt mainstream services, and might they bring pressure to bear on the Archive to offer a fuller range of cloud services that compete more directly with Amazon et al?
There’s no denying that the Wayback Machine is an incredible resource. But the design was starting to show its age. I recently spoke with George Oates, who led a complete redesign of the site, on how she developed the simple design and more user-friendly navigation system.
A single commodity hard disk is fast on its way to being able to store every song ever recorded;* a close examination of how the rapid improvement of storage technology might apply to communication, therefore, is long overdue. Consider email, where the retention of messages enables the threading of conversations by recipient, subject and date. For while recording telephone calls usually means government wiretaps, the merits of a communication archive from an end user’s perspective deserves some consideration.
Few over the age of 25 will like the idea of creating a permanent record of telephone calls and other forms of communication, but the discomfort of mature adults can represent a counter-indicator. Plus, it seems safe to assume that people can distinguish between government (bad) and personal (good) uses of recording technology. Communication archives will require strong privacy tools and a reliable delete function, but an argument against a permanent record is an argument against communication. After all, people avoid email in some contexts, but no one proposes eliminating email archives.