Rocky Mountain Born Digital Fever: Site visits in Denver and Boulder

As a mid-career professional in the LAM community, I have learned the value of site visits as a practical method for researching the tools and workflows of other digital archivists, within and outside of academia. While conference sessions and workshops provide an awareness of current trends in the profession, informal discussions outside of the convention room can foster close working relationships and inspire collaborative ventures, yielding the most bang for one’s buck. Site visits take this a step further, though, by offering an opportunity to see the inner workings of another repository in real time, from which one may gain a better understanding of the whys and hows in order to determine what, if anything, can be applied to their own work, something often difficult to envision in a more traditional classroom-based learning environment.

On July 14th and 15th, I conducted four site visits in Boulder and Denver to learn how other LAM professionals in Colorado are processing born digital content in their repositories.

I began the first day of my trip meeting with Kate Moomaw, Assistant Conservator of Modern and Contemporary Art at Denver Art Museum (DAM), to learn more about the museum’s born digital holdings, particularly the American Institute of Graphic Arts (AIGA) Design Archives, part of their Architecture, Design & Graphics collection. Both DAM’s Native Arts and Modern & Contemporary Arts collections include born digital content as well but AIGA contains the largest and most diverse number of media formats and file types, thus serving as an excellent use case for the preservation and cataloging of born digital records across the institution.

This slideshow requires JavaScript.

Founded in 1914, AIGA is a professional organization for design advocates and practitioners. The AIGA Design Archives consists of over twenty thousand selections, dated as early as 1924 to the present day, from AIGA’s annual juried design competitions. Design Curator Darrin Alfred, Kate Moomaw, and Sarah Melching presently a paper last year titled “Exploding sodas, shrinking fruit, and yesterday’s CD-ROMS: Content and Conservation of the AIGA Design Archives at the Denver Art Museum” at a design conference in Germany.

DAM houses a portion of the materials while the rest are still with AIGA or have been transferred to the Rare Book and Manuscript Library at Columbia University’s Butler Library. Included among approximately twelve thousand objects in the AIGA Design Archives at DAM, are over seven hundred floppy disks, USB flash drives, and optical discs.

Last year, Kate Moomaw supervised former intern Eddy Colloton on an nine-week project cataloging and ingesting seventy unique born digital works into the museum’s digital repository. Colloton used a combination of BitCurator and Archivematica to image removable media; generate various types of metadata; identify, characterize, and normalize file formats; and create derivatives for preservation and access, which were packaged according to the BagIt specification. Colloton extracted metadata from the files with MediaInfo and ExifTool. Born digital materials were described in Lucidae’s product Argus, DAM’s collection management system.

This slideshow requires JavaScript.

More details on the project, including a complete list of works that were cataloged and ingested into DAM’s digital repository are posted on Eddy Colloton’s blog.

On the second day of my trip, Lori Emerson, Associate Professor of English and the Intermedia Arts, Writing, and Performance Program at the University of Colorado at Boulder (CU Boulder) gave me a tour of the Media Archaeology Lab (MAL). Founded by Emerson in 2009, MAL provides a space for “hands-on, cross-disciplinary experimental research, teaching, and artistic practice using still-functioning but obsolete tools, software, and hardware.”

Although MAL collects obsolete machinery and instruments of all types such as typewriters and electronic word processors to magic lanterns and phonographs, the focus of my visit was to explore the variety of legacy computer hardware. The largest digital media lab of its kind in North America, MAL traces the history of computing beginning in the 1970s, highlighting landmark moments in technology.

This slideshow requires JavaScript.

MAL also holds a number of software programs, including video games, interactive fiction, and electronic literature. Their collection of e-lit includes works by poets Judy Malloy, Stephanie Strickland, and bpNichol in addition to authors Paul Zelevansky, Deena Larsen, and Ian Bogost.

I was impressed by how many of these computers were up and running for artists, students, and other researchers across disciplines. Emerson has built a lab where visitors can engage in an authentic experience with born digital content in its native environment, right down to the feel of the keyboard and mouse under their fingertips.

Coming from an archives background, it was enlightening to see firsthand the work of those in other sectors, within and outside of the LAM community, in terms of preservation and access to born digital content. It also brought me to the realization that the challenges we are all facing, including technological obsolescence and bit rot; lack of infrastructure; and fiscal support from internal or external stakeholders, could be the nexus for professionals across disciplines. Those dealing with these issues cannot continue to silo our knowledge and work within a vacuum i.e. our own distinct professional communities. Collaboration is a necessity.


Call for Speakers – SAA Panel Discussion on Processing Hybrid Collections

My colleague, Laurie Rizzo, and I are putting together a panel on processing hybrid archival collections (analog + born-digital/e-records) for the Society of American Archivists annual meeting next year in Cleveland, Ohio. We are seeking three to four additional speakers for an informal panel discussion on the subject. Our aim is to provide attendees with practical methods for arranging and describing digital records within a larger mixed format collection that they can easily apply to their own workflows.  This topic is quite timely considering The Signal‘s recent interview with Sibyl Schaefer.

Below is a draft of our abstract, which will be revised as speakers are added to the panel. Please contact me at aadams at hagley dot org if you are interested in participating!

The Evolution of “Traditional” Formats: Twenty-first century hybrid collection processing

Archivists who process materials created within the last three decades are eventually confronted with the task of providing access to born-digital content. Often, this is within the scope of a larger physical collection. The concept of “traditional” archival formats is disappearing, and the old excuse “I’m not good with computers” is no longer acceptable. Even archivists who largely process analog manuscripts and still images will in time find it necessary to create descriptions for either born-digital or digital derivatives within the scope of a finding aid. Not to fear, electronic records are merely a “virtual” representation of the archival formats we are already familiar with. Their description follows the same standards the archival profession uses for all other formats.

Speakers will discuss workflows for accessioning, digital preservation, arrangement, description, and access. Emphasis will be placed on maintaining consistency and providing context by assigning descriptive work to the archivist(s) responsible for processing the physical collection. Examples of finding aids for hybrid collections will be provided, with detail on following standards, including DACS and DCRM(G), to create physical descriptions for born-digital content.



What Sweet Irony: Outsourcing Backup Tape Data Retrieval

Below is an extended version of the lightning talk I presented at the Society of American Archivist’s annual meeting on August 16, 2014

The repository I work in, Hagley Museum and Library, houses materials created by a variety of business and organizations, some defunct, others still active. Due to the proprietary nature of the business records we collect in addition to the privacy and security concerns of active corporations, many of our collections are on deposit or closed for twenty-five years or more.

In 2012, Hagley received a large hybrid collection, consisting primarily of textual analog materials, in addition to a number of born-digital records. The records were created by various tech corporations during the normal course of business in the late 1990s and early 2000s and document aspects of the dot-com boom and bust, an area of research where primary sources are sorely lacking.

Even though the collection is closed for twenty-five years from date of creation, I could not let the electronic records sit on a shelf untouched during that time, as with paper records.   Given the potentially high research value of the collection, I decided the preservation of its born-digital content was a top priority, particularly since much of it resided on physical media that is already at risk for loss. With the assistance of a few coworkers, I culled hundreds of record cartons to discover the following obsolete media formats: 349 compact discs; 134 3.5” floppy disks; 113 digital linear tapes (DLT); 49 digital data storage tapes (DDS); 19 quarter-inch mini cartridges; 15 Travan cartridges; and 8 zip disks.

Although the CDs and floppy disks presented few problems, the remaining obsolete formats offered a lesson in how complex data recovery can be. My attempts to use “freecycled” drives and jerry-rig old PCs were just not working. Even if I could connect a computer to the exact generation DLT or DDS drive to read the tapes, I would also need to know the software program used to create the backup, which could vary widely depending on the date of creation, then successfully install it, and cross my fingers the media was not encrypted or corrupt.

Since Hagley is a small shop with limited in-house resources, it was clear to me that outsourcing the data extraction was the best course of action. After consulting several vendors, I found a company that specializes in data extraction and indexing of backup tapes. The vendor’s office was close enough I could make an in-person visit to the digital lab and test out data retrieval on a few backup tapes, free of charge. Although the lab is not set up to read Travan or quarter-inch mini-cartridges, the vendor successfully read the DLT and DDS tapes I brought.

After establishing a budget for the first phase of the project, I sent the vendor a sample consisting of five DLT and three DDS tapes. Less than a week later, the vendor provided me with access to the indexed data from seven out of eight tapes. After a brief training session, I was able to access the content in the vendor’s hosted system via a web browser where I could eliminate duplicates, search item-level full-text and metadata, and filter content by file type, format, and date. I then tagged data of potentially high research value for download. Due to the size of the collection, I was strict with appraisal, retaining only about ten percent of the data. The original media was returned to Hagley a few weeks later. Having successfully completed the first phase of the project, we will continue to use the same company for the remaining tapes.

In conclusion, here are a few key points to consider when outsourcing data recovery and retrieval. First, ask yourself if the data is even worth recovering. Not all collections are created equal and neither are all born-digital records. Next, do you have the in-house resources to read and extract the data to a secure storage area? Even if no is the answer, this does not mean you should immediately search for a vendor. Instead, consider the short-term and long-term costs of performing the data retrieval in-house. Perhaps the fiscal and temporal costs to your repository are sustainable. Remember such costs include purchasing, installing, and maintaining equipment and software, training yourself and other employees to use the system, perhaps even hiring a new staff member.  How often do you anticipate using the system in the near future? If the data resides on a very rare and expensive media format your repository will likely never encounter again, it may not be worth the time and effort to do in-house.  More importantly, before turning to a vendor, consider collaborating with another organization or institution to retrieve the data. They may have equipment and resources you need and vice versa. Finally, if you do decide to outsource, research and compare vendors; get quotes; read the vendor agreement carefully before committing; and always send a sample first.