Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A database of questions with unambiguous numeric answers

I (and co-hackers) are building a sort of trivia game inspired by this blog post: http://messymatters.com/calibration. The idea is to give confidence intervals and learn how to be calibrated (when you're "90% sure" you should be right 90% of the time).

We're thus looking for, ideally, thousands of questions with unambiguous numerical answers. Also, they shouldn't be too boring. There are a lot of random statistics out there -- eg, enclosed water area in different countries -- that would make the game mind-numbing. Things like release dates of classic movies are more interesting (to most people).

Other interesting ones we've found include Olympic records, median incomes for different professions, dates of famous inventions, and celebrity ages. Scraping things like above, by the way, was my reason for asking this question: Scrape HTML tables from a given URL into CSV

So, if you know of other sources of interesting numerical facts (in a parsable form) I'm eager for pointers to them. Thanks!

like image 994
dreeves Avatar asked Apr 19 '10 04:04

dreeves


1 Answers

Video game category

vgchartz.com have various charts for video game titles and hardware performance.

Sample queries:

  • Worldwide total sales of video game titles of all time
  • Hardware sales between 01/03/2010 to 05/22/2010: Wii-PS3-X360 in America, Japan, UK, Australia

There's enough data for questions like:

  • How many hardware/title X were sold in Year Y/first week of sales?
  • Title X outsells Title Y (in their respective first N weeks of sales) by how much/what ratio?

Popular music category

billboard.com is all you need.

Wikipedia links

  • Billboard charts
  • Billboard Hot 100
  • Billboard 200
  • Billboard Hot 100 50th Anniversary Charts
  • List of best-charting U.S. music artists
  • List of best-selling music artists
  • Best-selling albums in the United States since Nielsen SoundScan tracking began

In addition to sales figures, you can also ask queries about chart positions, e.g.:

  • In Category Y of Chart Z, where does song X place/how many songs does artist X have?

Making the most out of your data

You can make unambiguous numeric Q/A out of most lists. Take for example, a list like TIME.com All Time 100 Novels

Some generic questions that can be asked are:

  • How many are written in a given time period?
    • Decade, year, in the presidency of George Bush, before 9/11, etc.
  • What's the gap in rank between Title X and Title Y?
    • Pairwise queries like this really make the most of your data!

You can do this with any given Top 100 lists:

  • Time 100
  • Time 100: The Most Important People of the Century
  • Bravo's 100 Greatest TV Characters
  • TV Guide's 100 Greatest Episodes of All Time
  • List of most-watched television broadcasts

History category

historyorb.com is just one example. The URLs and HTMLs are very scrape-friendly.

  • Calendar of Famous Birthdays, Deaths, Events

There are many similar sites, e.g. brainyhistory.com.

You can also use these dates to "cross" with the other data (e.g. the Top 100 Novels example above).


Movie category

The Internet Movie Database is of course... the internet movie database!

  • IMDb/USA Video Rentals Archive Calendar, All-Time World Wide Box Office
    • "How much do Movie X, Y, Z gross in total?"
  • The plain text data files (available via FTP, read copyright/license)
like image 91
polygenelubricants Avatar answered Sep 29 '22 13:09

polygenelubricants