Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Yelp data file type

I have tried to open the file from the Yelp dataset challenge website (https://www.yelp.com/dataset_challenge). I have successfully done that, however, I cannot open the file, as it does not have an extension. It is about 4 GB. I thought it might've been a JSON file because when I searched around, in the past it was. However, I can't figure out how to open this or convert it to CSV. I'd like to use some analysis with Python on this data. Can anyone help me? Thank you.

like image 762
Jonathan Villegas Avatar asked Apr 26 '17 02:04

Jonathan Villegas


2 Answers

I was having the same issue. Turns out that the file inside the tar (the one without the extension) is a tar file as well - so the download is basically a tar file inside a tar file. After extracting the original file, add the tar extension to it, and then extract that. After extracting that, you'll have all the different json files for the data set.

like image 66
Bjafri5 Avatar answered Sep 22 '22 10:09

Bjafri5


The github project for Yelp dataset examples has a few samples, one of them is "json_to_csv_converter" which should help you do what you're asking for.

Yelp's Academic Dataset Examples

Let me know if this helps!

like image 34
William Cross Avatar answered Sep 19 '22 10:09

William Cross