Google Books Ngram Viewer
Last Updated: Aug 25, 2022 4:34 PM
Connect to Database
- Google Books Ngram Viewer This link opens in a new windowA statistical analysis of text or speech content to find n (a number) of some sort of item in the text. More InfoPUBLIC
This database is freely available to the general public via the Internet.
Description
Google Books engineering manager Jon Orwant writes (http://googleblog.blogspot.com/2010/12/find-out-whats-in-word-or-five-with.html) of the tool: ". . . we hope the Google Books Ngram Viewer will spark some new hypotheses ripe for in-depth investigation, and invite casual exploration at the same time." The Viewer is a visualization tool that draws an over 5.2 million book sample (American English, British English, Chinese (simplified), English, English fiction, French, German, Hebrew, Russian, and Spanish) from the Google corpus of over 15 million books. It enables a user to discover and graphically display the appearance of words and phrases across time, suggesting the ebb and flow of ideas, changes in style and usage, and historical change. It does this by counting the appearance of ngrams (words or phrases) in books - not by counting books that host a given ngram(s). By supporting the display of multiple datasets on the same graph, it reveals, or at least suggests, correlations.
What is an Ngram? Ngram is a technical term for a sequence of letters of any length. The Ngram dataset is comprised of over 500 billion words. The chronological tables that appear below a graph do not lead directly to the dataset; but, instead, represent a search across the entire Google Books corpus. There is no link to the books in the dataset. The tables offer excellent access to an almost unimaginable wealth of primary source material. Use the smoothing capability provided in graphing to give granularity or trend focus to a search. You can emphasize individual years or long trends. Use of this feature will dramatically change the graphic presentation of a search. "Smoothing": "Often trends become more apparent when data is viewed as a moving average. A smoothing of 1 means that the data shown for 1950 will be an average of the raw count for 1950 plus 1 value on either side: ("count for 1949" + "count for 1950" + "count for 1951"), divided by 3. So a smoothing of 10 means that 21 values will be averaged: 10 on either side, plus the target value in the center of them."
For background on the Viewer and the searches it makes possible, visit: http://www.culturomics.org/. Here are links to pivotal articles. You will find the experiments of many Ngram creators by simply doing a Google search. Also, try the Twitterhttp://twitter.com/ hastags #ngram and #culturomics. And see Anthony Grafton's piece in the newsletter of the American Historical Association, Nature News' Culturomics: Word Play http://www.nature.com/news/2011/110617/full/474436a.html, and the Harvard University Press Blog's Culturomics, Close Reading, and Casaubonhttp://harvardpress.typepad.com/hup_publicity/2011/06/culturomics-close-reading-and-casaubon.html.
Dates Covered
Books in American English, British English, Chinese (simplified), English, English fiction, French, German, Hebrew, Russian, and Spanish, 1500-2008.
Print Counterpart
N/A