What does the N-gram Viewer do?

When you enter phrases in to the Metafilter, it displays a graph showing how frequently those phrases have occurred in the titles of Metafilter posts over the years. You can select from four different corpora: metafilter, ask metafilter, metatalk and music.

Let's look at a sample graph:

Chart screenshot

This shows trends in three n-grams from 1999 to 2013: "I for one welcome" (a 4-gram), "what it says on the tin" (a 6-gram), and "slyt" (a 1-gram or unigram). What the y-axis shows is this: of all the 4-grams seen in each year in titles of posts, what percentage of them are "I for one welcome"? Of all the 3-grams in each year, what percentage of them are "what it says on the tin"? What percentage of unigrams are "slyt"? Here, you can see that the first time the phrase "I for one welcome" was used in a title was 2004 and peaked in 2010, that the last time "what it says on the tin" was clever is 2008, and that since 2010 the use of "slyt" in post titles has exploded.

How big does n go?

The viewer can show 1-grams, 2-grams, 3-grams, 4-grams, 5-grams and even 6-grams.

How does the n-gram viewer handle punctuation?

The n-gram viewer does some "cooking" of text before processing it:

Who is responsible for this?

I am John Wiseman, AKA jjwiseman and lemonodor. There is no official link between this project and Metafilter.

The best non-public way to contact me is via email:

Where does the data come from?

The data for the Metafilter n-gram viewer comes from the Metafilter infodump.

The data is up-to-date as of Jan. 26, 2013.

Data statistics

SiteDistinct n-grams

Are there bugs?

Yes. See this list.

I want to see the code or run it myself

That's great! All the code is available at

