Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get a list of all film ids from Freebase?

Tags:

rdf

freebase

mql

On a project I was working on a couple of years back, I was building a set of data about movies from Freebase. A simple shell script downloaded the "film.tsv" file (from http://download.freebase.com/datadumps/latest/browse/film/film.tsv). I then used the "id" field in that file to build the necessary MQL requests for each of the films (retrieving the other properties I was interested in e.g. actors, genres).

After looking at the developer's guide today I realise that Freebase has moved on a fair bit and significantly I see that the dump file I used before is no longer available. I also see that the dump file format is now RDF and from what I can tell the dump files are now only available as a single 22GB archive.

If at all possible I would like to avoid downloading a 22G file each time I want to rebuild my data set so is it possible to retrieve individual dump files anymore e.g. like the film.tsv file?

If not is there an alternative way to obtain a full list of movie ids?

like image 968
ddswy Avatar asked Oct 03 '22 06:10

ddswy


1 Answers

There's no replacement planned for film.tsv right now. You can get the current list of film IDs from the RDF dump like this:

zgrep $'\ttype\.object\.type\tfilm\.film' freebase-rdf.gz

Then when you need to update the list you query the MQL Read API for a list of new films that have been added since your last update:

[{
  "type": "/film/film",
  "id": null,
  "name": null,
  "timestamp": null,
  "timestamp>=": "2013-12",
  "sort": "-timestamp"
}]

Since the API returns 200 results at a time you'll need to use a cursor to get the full list of results.

like image 103
Shawn Simister Avatar answered Oct 11 '22 17:10

Shawn Simister