Digital Records in Special Collections

Documentation of digital preservation efforts in UB Libraries Special Collections.

Last Updated: Sep 30, 2025 10:28 AM

Background and Goals

Special Collections aims to preserve and provide access to all digital material in our collections, including websites. Our web archiving efforts began in 2020 with the University at Buffalo Novel Coronavirus (COVID-19) Web Archive project, which documented the University at Buffalo’s response to the COVID-19 pandemic through its online presence. Due to the inherently ephemeral nature of the web, and as more and more communication moves to an online format, the University at Buffalo is developing a more robust web archiving practice to capture the web content that falls into Special Collections’ collecting policies.

What We Capture

The University at Buffalo University Libraries captures web content falling into a variety of categories outlined in the collecting policies of the University Archives, Poetry Collection, and Robert L. Brown History of Medicine Collection. This includes both websites created by the University at Buffalo and its constituents and web content created by outside entities, including donors and associated organizations.

Types of web content captured by the University at Buffalo University Libraries:

Historically significant University at Buffalo website and webpages
Websites of faculty and notable alumni represented in our collections
Websites of poets, artists and other figures represented in our collections
Websites of community organizations represented in our collections
Websites of student organizations
Online student publications

At this time, University Libraries does not capture social media, nor do we attempt to capture the entirety of the University at Buffalo website. We do not capture websites hosted outside the University at Buffalo's domain unless they fit into our collecting policies and we have express permission from the creator(s).

Capture, Access and Discovery

University Libraries utilizes a variety of tools to capture websites, maintain archived web content, and provide access to our web archives. We have been a member of the Archive-It user community since 2023 and use the Internet Archive’s Heritrix software to crawl, preserve, and provide access to web content falling into our collecting policies.

We also use Conifer, Rhizome’s web-based web archiving tool to capture simple webpages and provide access to them through Archive-It or our digital preservation system, Preservica.

All web content captured using any of the technology described above is also described and discoverable in our finding aids database.

You must have a Conifer account to capture websites using web archiving service. Go to https://conifer.rhizome.org/ and click “Create a Free Account” to sign up using your UB email.

If you already have an account, click “Log In” in the top right corner of the screen. Once you log in with your email and password, you will be redirected to a “New Capture” page.

Copy and paste the URL of the website you would like to capture in the box labeled “URL to capture.”
Click on the “Add to collection” dropdown menu. Either select an existing collection or click “Create new collection.”
1. The title of the collection will be the name of the .warc file you will eventually download, so name the collection according to file naming best practices. Include the collection or accession number and a meaningful title for the website. (E.g. PCMS-0130-BD-name-magazine-2023).
Under “Select browser,” choose “Use Current Browser.”
Optionally, add any notes about this session to the Session settings.”
Click “Start Capture.”
In the top left corner, watch the “Capturing” button until the recording symbol stops blinking and the number of megabytes stops changing.
You must open any subpages or embedded links during this session in order to capture them. Opening any embedded links or subpages you would like to capture in another tab can help you keep track of what you have already captured.
Repeat step 6 for each page.
Once you have captured all the pages you would like to archive, click the “Capturing” button to stop recording. You will be redirected to the collection page.
Check the page titles and URLs to make sure you have everything you wish to capture. If you have additional pages to add, click “New Session.”
Once you are ready to export the captured website, click the three dots next to the “New Session” button.
From the dropdown, click “Download Collection.”
Navigate to your downloads. Click on the WARC file you just downloaded and unzip the file.
Add the “.warc” extension back onto your unzipped file.

Ingest to Preservica

Create Dublin Core metadata as usual and save as OPEX.
Ingest the .warc file and metadata using the Preparation and Upload Tool. Make sure security tag is set to “Open.”
Once ingest has completed, navigate to the asset in Explorer. Right click on the asset and select “Properties.”
Click “Edit” in the bottom right corner.
Click “Add Identifier.” Under the “Type” column, add “URL.” Copy and paste the URL of the captured website into the “Value” column.
Click “Save.”

For more information on using Archive-It, please consult the Internet Archive's Archive-It Help Center. There, you will find comprehensive guides and video tutorials on using Archive-It to capture and provide access to web content.

Permissions, Copyright and Privacy

Most web content captured by Special Collections is created by the University at Buffalo, to which we already hold the copyright. Special Collections must first obtain permission before capturing and providing access to webpages and websites created and hosted by entities not directly affiliated with the University at Buffalo. Copyright remains with the creator and is not transferred to the University at Buffalo upon capture. We will also honor robots.txt unless necessary for capture and expressly permitted by the creator of the website.

Other Resources

This page was created under the guidance of the resources linked below. Please consult them for further information related to web archiving best practices.