Skip to Main Content

Databases By Title:

Google Books Ngram Viewer

Last Updated: Aug 25, 2022 4:34 PM


Connect to Database

This database is freely available to the general public via the Internet.

Description

Google Books engineering manager Jon Orwant writes (http://googleblog.blogspot.com/2010/12/find-out-whats-in-word-or-five-with.html) of the tool: ". . . we hope the Google Books Ngram Viewer will spark some new hypotheses ripe for in-depth investigation, and invite casual exploration at the same time." The Viewer is a visualization tool that draws an over 5.2 million book sample (American English, British English, Chinese (simplified), English, English fiction, French, German, Hebrew, Russian, and Spanish) from the Google corpus of over 15 million books. It enables a user to discover and graphically display the appearance of words and phrases across time, suggesting the ebb and flow of ideas, changes in style and usage, and historical change. It does this by counting the appearance of ngrams (words or phrases) in books - not by counting books that host a given ngram(s). By supporting the display of multiple datasets on the same graph, it reveals, or at least suggests, correlations.

What is an Ngram? Ngram is a technical term for a sequence of letters of any length. The Ngram dataset is comprised of over 500 billion words. The chronological tables that appear below a graph do not lead directly to the dataset; but, instead, represent a search across the entire Google Books corpus. There is no link to the books in the dataset. The tables offer excellent access to an almost unimaginable wealth of primary source material. Use the smoothing capability provided in graphing to give granularity or trend focus to a search.  You can emphasize individual years or long trends. Use of this feature will dramatically change the graphic presentation of a search. "Smoothing": "Often trends become more apparent when data is viewed as a moving average. A smoothing of 1 means that the data shown for 1950 will be an average of the raw count for 1950 plus 1 value on either side: ("count for 1949" + "count for 1950" + "count for 1951"), divided by 3. So a smoothing of 10 means that 21 values will be averaged: 10 on either side, plus the target value in the center of them."

For background on the Viewer and the searches it makes possible, visit: http://www.culturomics.org/. Here are links to pivotal articles. You will find the experiments of many Ngram creators by simply doing a Google search. Also, try the Twitterhttp://twitter.com/ hastags #ngram and #culturomics. And see Anthony Grafton's piece in the newsletter of the American Historical Association, Nature News' Culturomics: Word Play http://www.nature.com/news/2011/110617/full/474436a.html, and the Harvard University Press Blog's Culturomics, Close Reading, and Casaubonhttp://harvardpress.typepad.com/hup_publicity/2011/06/culturomics-close-reading-and-casaubon.html

 

Dates Covered

Books in American English, British English, Chinese (simplified), English, English fiction, French, German, Hebrew, Russian, and Spanish, 1500-2008.

Print Counterpart

N/A

Librarian

Profile Photo
Laura Taddeo
Contact:
421 Lockwood Library, North Campus
(716) 645-7970
Website
Subjects: English