Skip to Main Content

Digital Records in Special Collections

Documentation of digital preservation efforts in UB Libraries Special Collections.
Last Updated: Nov 12, 2024 10:54 AM

Introduction

Archival processing occurs over several phases. Steps include communicating with donors, accessioning, gathering information about the records, reviewing the records for preservation needs and sensitive information, establishing an arrangement, describing the records, and providing access. There are many degrees to which archivists process records, some of which are more intensive than others, but all of which should be done well. Digital records require a broad approach due to the large quantity of files in any given accession. UB Special Collections has chosen to adapt and reuse the Digital Processing Framework as a guideline for processing our born-digital collections. This framework, developed by ten archival practitioners after a meeting at the 2016 Born Digital Archiving eXchange unconference, is adaptable, reusable, and assists processors in the various stages of born-digital processing.

Broadly defined processing steps are outlined below. Processing checklists for minimal processing, moderate processing, and intensive processing are also available for download. These three processing tiers correspond to levels outlined in Guidelines for Efficient Archival Processing in the University of California Libraries and UB Special Collections' Archival Processing Documentation.

Working with Donors

  • Share the Donor Guide and walk through the checklist carefully
  • Clarify expectations regarding digital preservation and access
  • Ask the donor the following questions to gain a broad understanding of the digital records
    • Location of computer workstations (on campus or at home)
    • Removable storage media (hard drives, USB flash drives, CDs, etc.)
    • Approximate date range of files
    • File types
    • Approximate quantity
  • Conduct an initial survey of born-digital materials:
  • Determine the appropriate survey method by consulting with the donor
  • Conduct onsite surveys (in person) when possible
  • Before acquisition, ask if the collection includes legally protected files, such as educational, medical, financial, criminal, attorney-client, and personnel records
  • Inform the donor that the presence of sensitive information about individuals other than the donor or creator may limit access to their own files
  • Make clear who will screen email messages for sensitive information and what the process will be. *Refer to Section VIII. Working with Email for further instruction*
  • Ask the donor to complete the Privacy checklist in the Donor Guide
  • Ask the donor if there is other people’s intellectual property in the donor’s files
  • Discuss any rights held by individuals other than the donor or creator that would limit Special Collections ability to provide access and delivery of the files to researchers
  • Discuss the methods by which Special Collections will provide researchers with access to the digital files
  • Identify which born-digital materials are to be offered to Special Collections.
  • Ask the donor to document the ways in which digital media and files have been stored, accessed, and transported prior to their arrival or collection by Special Collections.
  • Complete the table below with the donor. Indicate which file types are included in the donation:
  • Audio
  • Raster images
  • Databases
  • Raw Camera images
  • Email* (see additional information below)
  • Spreadsheets
  • Open Office
  • Vector images
  • Plain Text
  • Video
  • Portable Document Format
  • Word Processing files
  • Presentation files (i.e. PowerPoint)
  • Other: __________________________

The deed of gift acknowledges transfer of ownership to Special Collections.

  • Review the Deed of Gift
    • Transfer of copyright
    • Permission to make preservation and access copies
    • Permission to display online
    • Disposition of duplicate or unneeded materials
    • Disposition of computer hardware, removable media and files not retained
  • The donor and Special Collections staff representative sign the Deed of Gift
  • Provide a copy of the signed deed to the donor

Supporting Documentation

Storage Media Inventory

Donors may choose to transfer digital records via removable storage media. If so, before Special Collections staff copy the digital records from the physical media, it is important to collect as much information as possible about the storage media itself.

  • Record the following information in a media inventory spreadsheet. Name this file [accession number]-media-inventory.xlsx. Save this spreadsheet in the accession file. 
    • Accession number - the accession number assigned to the collection 
    • Unique ID – a unique ID assigned to each piece of storage media (ex. accession#–sequential number, 2017-001-01; 2017-001-002) 
    • Label information – a transcription of any pre-existing labels on the media
    • Date(s) -  The creation date or date range of the files stored on the media
    • Storage media type - The type of media
    • Storage capacity – maximum storage capacity of the media, expressed in gigabytes
    • Physical dimensions – dimensions of the media expressed height x width 
    • Manufacturer – manufacturer of the media 
    • Model/Series - if known, the model and series of the media 
    • External parts – list of any other parts that came with the media (cables, chargers, etc.)

Related Tools

An important part of working with digital records is to ensure their health from viruses, especially as these records will be placed in long-term preservation storage. Special Collections runs virus checks on all donated digital records with Symantec Endpoint Protection.

  1. Right click on the folder containing working copy files.
  2. From the menu, select "Scan with Microsoft Defender..." This will begin a custom scan, and Microsoft Defender will open in a new window.
  3. Watch the progress of the scan at the top of the window under "Scan options."
  4. Record the results of the scan in a new column in your Storage Media Inventory.
  5. Should any viruses or threats be detected, stop the workflow and reassess the donation.

Instructions for creating a file manifest with DROID

  1. Open the DROID application, a new profile will be created automatically
  2. Select Add and browse to the digital records in the accession folder
  3. Select Start, this will scan the directory and run the file identification process
  4. Expand the directory to view file details (folder names, file names, extension, size, last modified date, format, PUID, Hash value)
  5. Select Export, a new window will open. Choose the profile, select Export profiles...
  6. Under Encoding select .csv, name this file [accessionnumber]-file-manifest.csv, save in the accession folder

Sometimes files are missing extensions or the extension is mismatched. When this occurs, DROID will issue a file extension warning mismatch.

  1. Use the LocateOpener application to identify and append missing file extensions
  • Right click on files missing extensions or on files with unfamiliar extensions
  • Select LocateOpener from the menu
  • If the file has an unknown extension, select “Scan with TrID”
  • A new window will recommend an appropriate file extension
  • Select “Append extension”
  • Rescan the accession to create an updated file inventory with amended file extensions
  1. Using the inventory, identify archive formats (gzip, bzip, wbzip2, zip, jar, tar) and unzip them prior to ingest into Preservica
  2. If unzipping a package with a mbox file(s), delete the unzipped folders after copying the mbox file(s) to the appropriate location in the accession folder
  3. Remove empty directories
  4. Remove .DS_Store and .Thumbs file
  5. If file extensions required amendment and/or archives formats were unzipped, create an updated file inventory by following the Inventory instructions above. Replace the file-inventory.csv file with the updated version

As with paper records, donors may unknowingly transfer collections that contain duplicate files. Duplicate files are identified by comparing each file’s checksum. Files with the same content have the same hash value.

  1. In Excel, open the file inventory .csv file
  2. Highlight the HASH column
  3. Select the Home section > Conditional Formatting > Highlight Cells Rules > Duplicate Values
  4. Choose a highlight color
  5. Excel will highlight all duplicate values
  6. Return to the working copy files in the accession folder
  7. Locate the duplicate files and move the files to a new Duplicates folder in the accession file
  8. Retain this folder until Preservica ingest is complete

As with traditional paper collections, digital records may contain sensitive or confidential information that is protected under federal or state “right to privacy” laws, including but not limited to certain education, medical, financial, criminal, attorney-client, and personnel records. Special Collections staff takes care to identify and, in some cases, removed personally identifiable information (PII) found within all archival collections.

Special Collections uses a tool called Spirion to search for and identify PII in digital collections.

  • Run the Spirion software on the working copy files
  • Conduct this step even if donors say there is no PII in their files
  • Open Spirion software and log in
  • Under “Start,” select “Start Search Wizard”
  • Under “AnyFind Searching” select the personally identifiable information (PII) data types you want to search for. The default types are Social Security Numbers, Credit Card Numbers, and Password Entries, but Special Collections also searches for Bank Account Numbers, Driver Licenses, Passport Numbers, and Health Info. 

Screenshot of the AnyFind menu in the Spirion Search Wizard.

  • Select “Next”
  • Select the Locations (usually “Files and Compressed Files” and "E-Mails and Attachments")
  • Under “File Locations” browse to the custom location. This is the accession folder or specific folder(s) in the accession folder you want to search

Screenshot of the "Locations" menu of the Spirion Search Wizard.

  • Review and confirm the selection, select “Finish”

Screenshot of the "Confirmation" menu in the Spirion Search Wizard.

  • Select “Save As” to generate a report. This report contains the file path (location), date modified, size, PII category, and match, number of hits and classification information of the files that contain PII
  • Add two additional columns to the end of the report: Action and Ingest
  • Record any action taken on these files in these fields
  • Save this report as an Excel file in the accession folder
  • Return to Spirion and review the files that contain PII
  • In consultation with the processing archivist, determine whether or not these files need action
  • Under the Actions menu, select “Redact” or “Shred”*

Redact – When a file contains sensitive information and you wish to keep the file but remove the sensitive information only, you should utilize the redact feature.

Shred – When a file contains sensitive information and you wish to remove the file from the accession, you should utilize the shred feature. Using the shred feature will delete the entire file.

Screenshot of the main menu of Spirion with results.
 

  • Update the Excel report file with all actions taken, including if files were skipped because the information was determined to not be PII. Also note if the file was re-ingested into Preservica**

*Spirion only allows actions on certain file types and versions. For a complete list, click here. It may be necessary to migrate files to newer versions or different formats before taking action. Always work on a working copy file, never on the original.

**If PII is identified after ingest into Preservica, download the file(s), follow steps 1-11 above, then re-ingest the files using the Preparation and Upload Tool, and finally, delete the original un-redacted files in Preservica.

Preservation

Special Collections uses Preservica, a suite of OAIS (Open Archival Information System) compliant workflows, to manage our digital records. This system assists with the accessioning, processing, and preservation of our digital materials.

Special Collections’ primary strategy for preservation is to normalize files for preservation and presentation. The Digital Preservation Coalition defines digital preservation as “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary…and refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological and organisational change.”[1] The DPC defines migration as “a means of overcoming technological obsolescence by transferring digital resources from one hardware/software generation to the next. The purpose of migration is to preserve the intellectual content of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration differs from the refreshing of storage media in that it is not always possible to make an exact digital copy or replicate original features and appearance and still maintain the compatibility of the resource with the new generation of technology.”[2]

The workflow for ingesting and preserving digital records in Preservica varies depending on a number of factors: types of files in the accession, size and arrangement of the accession, and the means of physical transfer. For example, donors transferring their files remotely by web application will utilize a different application than Special Collections staff who has received a donation of digital records via removable storage (i.e. a thumb drive). Donations of large quantities of digital records are uploaded to Preservica in batches to mitigate the time it takes to run normalization workflows on the entirety of the donated materials. Run workflows for normalization during ingest into Preservica or after ingest in Preservica depending on the type of file.

[1] http://www.dpconline.org/handbook

[2] http://www.dpconline.org/handbook/glossary#M

Review the file formats identified by DROID and make note of at-risk file types. Identify the target preservation formats, access formats, and pathways if running normalization workflows on ingest into Preservica.

Type
Preferred Formats
Acceptable Formats
Document
  • Portable Document Format Archival (.pdf)
  • OpenDocument Text (.odt)
  • Portable Document Format (.pdf)
  • OpenDocument Format (.odf)
  • Rich Text Format (.rtf)
  • Text (.txt)
Presentation
  • OpenDocument Presentation (.odp)
  • Portable Document Format Archival (.pdf)
  • Portable Document Format (.pdf)
  • OpenDocument Format (.odf)
Dataset
  • OpenDocument Spreadsheet (.ods)
  • Comma Separated Values (.csv)
  • Text (.txt)
  • Portable Document Format (.pdf)
Image
  • Tagged Image File Format (.tiff)
  • JPEG (.jpg)
  • JPEG 2000 (.jp2)
  • Scalable Vector Graphics (.svg)
  • Portable Network Graphics (.png)
  • Graphics Interchange Format (.gif)
  • Bitmap (.bmp)
Moving Image
  • Matroska (.mkv)
  • Interoperable Master Format/Material Exchange Format (.mxf)
  • FFV1
  • Audio Video Interleave (.avi)
  • MPEG-4 (.mp4)
Sound
  • Waveform Audio File Format (.wav)
  • Broadcast Wave Format (.bwf, .wav)
  • Audio Interchange File Format (.aiff)
  • MPEG-3 (.mp3)
Web Resource
  • Web Archive File (.warc)
  • Web Archive Collection Zipped (.wacz)
 
Email
  • Electronic Mail Format (.eml)
  • Personal Storage Table (.pst)
  • Portable Document Format (.pdf)
  • Message format (.msg)