Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search Google Ngrams for "dated" words & phrases?

I'd like to write an application that searches Google's Ngram data to return words and phrases that used to be more popular, by some arbitrary percentage, within some arbitrary range of years, than they are now.

For example: https://books.google.com/ngrams/graph?content=cowabunga&year_start=1950&year_end=2000&corpus=15&smoothing=3

Ideally, I'd like to be able to find these words and phrases without specifying them up front. Can anyone help me come up with a way to do this using a downloaded copy of the Ngrams data?

like image 919
Duncan Malashock Avatar asked Nov 10 '22 23:11

Duncan Malashock


1 Answers

First step after downloading some n-grams is to dump them into a SQLite3 database. For example, I fetched the 1-grams starting with the letter 't'

To dump them into SQLite, run the command sqlite3 1grams.db

sqlite> create table t1grams (ngram text, year integer, match_count integer, volume_count integer);
sqlite> .separator "\t"
sqlite> .import googlebooks-eng-all-1gram-20120701-t t1grams

Second step is to pick the year range, call them YEAR_START and YEAR_END, and your percentage, call it PERCENT_THRESHOLD.

Your problem reduces to a query where you select those ngrams such that match_count is PERCENT_THRESHOLD% less common at YEAR_END than at YEAR_START.

like image 124
tlehman Avatar answered Dec 05 '22 12:12

tlehman