Digital Humanities: Analyzing
Annotation & Encoding
There are different types of digital annotation, which range from open, free-form commenting on web pages to voice annotation and structured encoding using TEI (Text Encoding Initiative). The selected resources below reflect the growing annotation landscape in research and teaching.
Webpages and Documents
Hypothes.is - This is a free, open-source, browser-based tool that allow individuals and groups to annotate webpages, PDFs, and EPUBs. You can create public or private annotations and search your annotations. There is also a Hypothesis LMS app that can be integrated into Blackboard, Moodle, Canvas, and other systems.
Annotation Studio - This suite of tools is being developed by Hyperstudio at MIT, but it is available to use by signing up for a free account. It is an open-source web application that is designed with pedagogy in mind. Students can see threaded commentary and link to images or videos in their annotation to support their arguments. Their website includes a number of pedagogical case studies.
Text Encoding Initiative (TEI)
The Text Encoding Initiative (TEI) is a project that develops and maintains a standard for the representation of texts in digital form. The TEI provides guidelines for text encoding in the form of an extensible XML schema. XML, or Extensible Markup Language, is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The TEI also maintains a list of resources for learning to use the TEI guidelines for text encoding in XML.
VideoAnt - This is a web-based video annotation tool developed by the University of Minnesota. It can easily be used in a classroom for group projects. It is commonly used with YouTube, but annotations can be created for any publicly available video on the web. Annotations can be public or private.
- Lindsey Seatter. "Toward Open Annotation: Examples and Experiments." KULA: Knowledge Creation, Dissemination, and Preservation Studies 3, no. 1 (2019), http://doi.org/10.5334/kula.49.
HathiTrust Digital Library is a platform for preserving and accessing a variety of digitized content from sources such as Google, the Internet Archive, Microsoft, and institutional partner libraries. HathiTrust Research Center (HTRC) enables computational analysis of works in the HathiTrust Digital Library (HTDL) to facilitate non-profit research or educational uses of the collection. HTRC Algorithms are web-based, click-and-run tools to perform computational text analysis on volumes in the HathiTrust Digital Library. The algorithms can help you explore, analyze, and visualize public worksets or those you have created; they include a topic model explorer, named entity recognizer, and token count and tag cloud creator.
Voyant Tools is an open-source, web-based application for performing text analysis. It supports scholarly reading and interpretation of texts or collections of texts, particularly by scholars in the digital humanities but also by students or the general public. It can be used to analyze online texts or ones uploaded by users. Extensive documentation is available for the tools available in the Voyant interface.
Nvivo, a product of QSR International, is a software application for qualitative and mixed-methods research, particularly the analysis of unstructured text, audio, video, and image data. Free resources and tutorials for getting started with NVivo are available online.
Image Annotation - This type of annotation can be used to label and outline images for computer vision in machine learning, a process that can train computers to recognize particular objects. A variety of tools are available. These LionBridge and Hackernoon lists describe several options, including some open-source platforms.
Distant Viewing Toolkit (DVT) - Developed by Lauren Tilton and Taylor Arnold of the Distant Viewing Lab at the University of Richmond, along with other collaborators, the DVT is a free and open-source python module designed to automatically extract metadata features from a corpus of images (moving and static). The theoretical and interpretive framework that informs their work can be found in Distant Viewing: Analyzing Large Visual Corpora.
ArcGIS is an ESRI GIS (geographical information system) product for working with maps and geographic information. ArcGIS lessons and tutorials are available online, and ArcGIS maps can be incorporated into ESRI Story Maps -- a free tool for creating interactive web apps with maps, text, and multimedia.
QGIS is a free and open-source desktop GIS (geographic information system) application that supports viewing, editing, and analyzing maps and geographic information. The QGIS training manual includes lessons and documentation for using the software; additional help is available in the QGIS user guide.
Tableau Public is a free desktop application for data manipulation and visualization. It allows you to create custom interactive visualizations and combine multiple visualizations, text, and design elements - including maps, and Tableau will host up to 10GB of visualizations and dashboards on your public profile. Many resources for Tableau Public, including how-to videos, are available online, along with a gallery of Tableau visualizations and a blog by the Tableau Public team.
Gephi is widely used network analysis software that allows users to create and customize network graphs. Gephi plugins are also available to extend the functionality of the software. A quick-start guide and Gephi tutorials are available online.
Palladio is an online platform for analyzing temporal network data. Developed by Humanities + Design at Stanford University out of an NEH Implementation grant, it consists of a suite of tools to graph, map, and explore complex historical network data. Read more about Palladio or check out the Palladio tutorials and FAQ.
Cytoscape was developed through the Cytoscape Consortium with NIGMS funding. It is a desktop app designed for the analysis of molecular interaction networks, but it can be used to analyze and visualize network data of any kind, from bioinformatics to social networks. Cytoscape Apps (previously called Plugins) extend the base functionality of the software, allowing additional file type imports and exports or adding additional tools for analysis and visualization. Cytoscape tutorials and documentation are available on GitHub.
Programming Languages for Analysis
R is an open-source programming language used for data analysis and visualization that runs on multiple operating systems. The R environment can be downloaded from The R Project for Statistical Computing and used with RStudio, an open-source IDE (integrated development environment). A wide variety of R packages are available that extend base R's functionality for text processing, network analysis, spatial analysis, graphics, and more.
The following publications explore R applications for Digital Humanities:
- Humanities Data in R, by Taylor Arnold and Lauren Tilton
- Computational Historical Thinking with Applications in R, by Lincoln Mullen
- Statistical Methods for Studying Literature Using R, by Jeff Rydberg-Cox
Python is a general-purpose, open-source programming language. It is commonly used in Digital Humanities contexts to perform quantitative text analysis (with libraries like the Natural Language Toolkit - NLTK) as well as for a variety of other purposes including network analysis and data visualization. Jupyter Notebooks are a common format for sharing and running python code in classroom settings.
The following resources introduce python for Digital Humanities research: