When storing a large set of digital assets whether it be in a Digital Asset Management System or some other means there are a lot of things to think about
- Ingesting – Bringing digital assets into the system
- Governance – Security, lifecycle, access control
- Cataloging – Curation and searchability
- Dissemination – Retrieval
- Preservation – Guarding against loss
A good digital asset management system should be able to help with all of these aspects, but a tool is only as good as its user, and having even a rudimentary understanding of each goes a long way. I’ve found digital preservation to be one of the least understood topics when talking with clients. I want to give the reader a basic understanding of the two main problems digital preservation tries to solve.
Digital preservation is about preventing data corruption and obsolescence by ensuring the accessibility and authenticity of digital data.
Data corruption can be a major issue, especially when data is stored on magnetic tape or disk. Corruption typically shows up in the form of bit rot, which happens when the 1s and 0s are flipped at random due to magnetic fields or cosmic radiation passing through the storage device. The other form of data corruption is in the form of a catastrophic event like a flood or a fire. With the advent of distributed data storage systems such as AWS S3, this problem has largely been solved.
Solving both problems starts by duplicating the data on multiple different physical devices and storing them in different geographic locations. This is built into most cloud data solutions. The other aspect of guarding against bit rot is taking the Fixity. This usually occurs by taking the file’s checksum and storing it in the file’s metadata. All copies of the file need to be periodically checked to see if the fixity has changed. If it has changed, replace the affected file with one of the duplicates.
Obsolescence can occur in the wake of changing technology:
- When new file formats or encoding overtake the existing formats
- When the physical tools are no longer present to access or render the format
To solve this issue we look at two types of solutions.
File Format Migration
File format migration is when you convert one file format to another format such as converting a Microsoft Word document to an Archival Portable Document Format (PDF/A). This is the easiest presentation strategy to implement, but it is not without its challenges.
The first challenge is data loss. In the text document case, this is commonly negligible and easily guarded against. However, for more complicated data assets such as images, sound, and video the issue is more complicated. Quality loss can occur between formats due to encoding, sampling rates, or bit-rate differences. The key to protecting against quality loss is to capture key data metrics and store them in the asset’s meta-data for post-migration validation. An example of asset metadata, in the text document case, is page, word, paragraph, sentence, and character count. These characteristics can be programmatically compared when validating a migration. However, these metrics might not guard against loss of font, layouts, or character sets, which might be an issue if a legal department requires a light test. Gathering usable metrics may be easy for text assets but gets significantly harder for more complicated assets.
Another migration strategy is creating a risk profile for the migration. This is when you perform a migration on a sample set of the data assets and manually inspect the samples to check for loss. If the results are unacceptable a different target format, migration algorithm or preservation strategy may be needed.
File format migration can also get complicated when the behavioral aspects of a file are important. This could show up as a macro or VB script in a Microsoft Word document or an interactive form in a PDF. This gets more complicated for other document types such as a spreadsheet. Take the example of a spreadsheet in a pharmaceutical company. At face value, the data and the results are important but imagine a lawsuit is leveled against the company regarding its safety data that was captured in the spreadsheet. The court may need to evaluate the methodology and formulas used to calculate the results. If these behavior aspects aren’t captured during a file format migration, then the line of reasoning and proof of due diligence is lost. Migrating data only in this situation isn’t enough. This may lead us to emulation.
This video shows the upgrade path of Microsoft Windows from Windows 3.1 to Windows 8. The video gives an example of how migration leads to a loss in behavior.
In the case of spreadsheet behavior or other digital assets, like a piece of software or a game, emulation is often the only solution. Emulation is where software is created that can run on modern hardware to simulate the original hardware the asset ran on. The issue with emulation is it is often a more expensive and time-intensive solution to implement. The advent of virtual machines has helped solve these problems but requires the management of additional digital assets and configurations. In some cases, custom emulations may need to be developed for embedded software, such as an arcade cabinet.
Another complication of emulation comes from validating the behavioral requirements. Think about all the quality assurance that goes into a piece of software. Emulating an environment that runs the software isn’t enough, the software needs to be validated such that it behaves in the emulation as it did in the original environment.
Emulation is a complex subject and entire careers, foundations, and museums are built around this discipline and are outside the scope of this article.
This article barely scratched the surface of the digital preservation aspect of digital asset management but highlights the main aspects of what digital preservation attempts to accomplish. Understanding these basics can help you evaluate a digital asset management system along with developing your own strategy for storing and ultimately preserving your digital assets.
If you are evaluating a digital asset management solution or evaluating your asset governance strategy Ten Mile Square can help. For more information read about our experience helping clients with Digital Asset Management.