10 tips to preserve data for the long haul

More than 452 exabytes of information have been created and replicated this year.

The growth of digital data is threatening to spiral out of control. More than 452 exabytes of information have been created and replicated this year -- an amount higher than the world's available storage capacity, according to IDC.

Not all data should be preserved, but efforts to save important information are being stymied by many factors: complacency, fear that the problem of long-term digital access and preservation is too big to take on, inadequate funding, confusion, and lack of alignment among stakeholders, a new report says. A better model for preserving data is needed, and it requires worldwide collaboration, says the Blue Ribbon Task Force on Sustainable Digital Preservation and Access, which consists of experts from universities, major libraries and one tech company (Microsoft).

"The long-term accessibility and use of valuable digital materials requires digital preservation activities that are economically sustainable -- in other words, provisioned with sufficient funding and other resources on an ongoing basis to achieve their long-term goals," task force co-chairman Brian Lavoie of the Online Computer Library Center said in a press release.

Although the task force says an industrywide solution is needed, there obviously are many steps individual IT shops can take to implement a better data preservation plan. The task force's second co-chair, Fran Berman, director of the San Diego Supercomputer Center (SDSC) at the University of California, offered a list of 10 tips for preserving data in a recent article.

Here is a look at Berman's advice:

1. Make a detailed plan for the stewardship and preservation of your data, from its inception to the end of its lifetime.

2. Be aware of data costs including hardware, software, support and time, and include them in your overall IT budget. Determine whether it is more cost-effective to regenerate some of your information rather than preserve it over a long period.

3. Associate metadata with your data. Identify relevant standards for data and metadata content and format, and follow them to make sure the data can be used by others.

4. Make multiple copies of valuable data. Store some copies off-site and in different systems.

5. Plan ahead of time for the transition of digital data to new storage media. Plan budgets for new storage and software technologies, file-format migrations, and time. Move data to new technologies before your storage media become obsolete.

6. Plan for transitions in data stewardship. If the data eventually will be turned over to a formal repository, institution or other custodial environment, make sure it meets the requirements of the new environment and that the new steward indeed agrees to take it on.

7. Determine the level of "trust" required when choosing how to archive data. Are the resources of the U.S. National Archives and Records Administration necessary, or will Google do?

8. Tailor plans for preservation and access to the specific needs of users. Gene-sequence data used daily by hundreds of thousands of researchers worldwide may need a preservation and access infrastructure that's different from the infrastructure needed, for example, for digital photos viewed occasionally by family members.

9. Pay attention to security. Be aware of what you must do to maintain the integrity of your data.

10. Know the regulations. Know whether copyright, the Health Insurance Portability and Accountability Act of 1996, the Sarbanes-Oxley Act of 2002, the U.S. National Institutes of Health publishing expectations, or other policies or regulations are relevant to your data. That way, you can make sure your approach to stewardship and publication is compliant.

