I have tried to open the file from the Yelp dataset challenge website (https://www.yelp.com/dataset_challenge). I have successfully done that, however, I cannot open the file, as it does not have an extension. It is about 4 GB. I thought it might've been a JSON file because when I searched around, in the past it was. However, I can't figure out how to open this or convert it to CSV. I'd like to use some analysis with Python on this data. Can anyone help me? Thank you.
I was having the same issue. Turns out that the file inside the tar (the one without the extension) is a tar file as well - so the download is basically a tar file inside a tar file. After extracting the original file, add the tar extension to it, and then extract that. After extracting that, you'll have all the different json files for the data set.
The github project for Yelp dataset examples has a few samples, one of them is "json_to_csv_converter" which should help you do what you're asking for.
Yelp's Academic Dataset Examples
Let me know if this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With