On a project I was working on a couple of years back, I was building a set of data about movies from Freebase. A simple shell script downloaded the "film.tsv" file (from http://download.freebase.com/datadumps/latest/browse/film/film.tsv). I then used the "id" field in that file to build the necessary MQL requests for each of the films (retrieving the other properties I was interested in e.g. actors, genres).
After looking at the developer's guide today I realise that Freebase has moved on a fair bit and significantly I see that the dump file I used before is no longer available. I also see that the dump file format is now RDF and from what I can tell the dump files are now only available as a single 22GB archive.
If at all possible I would like to avoid downloading a 22G file each time I want to rebuild my data set so is it possible to retrieve individual dump files anymore e.g. like the film.tsv file?
If not is there an alternative way to obtain a full list of movie ids?
There's no replacement planned for film.tsv right now. You can get the current list of film IDs from the RDF dump like this:
zgrep $'\ttype\.object\.type\tfilm\.film' freebase-rdf.gz
Then when you need to update the list you query the MQL Read API for a list of new films that have been added since your last update:
[{
"type": "/film/film",
"id": null,
"name": null,
"timestamp": null,
"timestamp>=": "2013-12",
"sort": "-timestamp"
}]
Since the API returns 200 results at a time you'll need to use a cursor to get the full list of results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With