Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop for JSON files

Tags:

json

hadoop

Would you have any hints on what would be the best way to deal with files containing JSON entries and Hadoop?

like image 419
MaVe Avatar asked Mar 30 '12 11:03

MaVe


People also ask

What is a true JSON record in Hadoop?

JSON records contain JSON files where each line is its own JSON datum. In the case of JSON files, metadata is stored and the file is also splittable but again it also doesn’t support block compression. The only issue is there is not much support in Hadoop for JSON file but thanks to the third party tools which helps a lot.

Is it possible to compress JSON files in Hadoop?

In the case of JSON files, metadata is stored and the file is also splittable but again it also doesn’t support block compression. The only issue is there is not much support in Hadoop for JSON file but thanks to the third party tools which helps a lot.

How to analyze JSON files in HDFS?

Since the JSON files are expected to be in HDFS, we can leverage the HdfsDataFragmenter and HdfsAnalyzer. These classes are very generic and will fragment and analyze all files stored in HDFS, regardless of actual data format underneath.

What are the input file formats in Hadoop?

This Input file formats in Hadoop is the 7th chapter in HDFS Tutorial Series. There are mainly 7 file formats supported by Hadoop. We will see each one in detail here- 1. Text/CSV Files 2. JSON Records 3. Avro Files 4. Sequence Files 5. RC Files 6. ORC Files 7. Parquet Files


2 Answers

There's a nice article on this from the Hadoop in Practice book:

  • http://java.dzone.com/articles/hadoop-practice
like image 169
Chris White Avatar answered Sep 26 '22 10:09

Chris White


Twitter's elephant-bird library has a JsonStringToMap class which you can use with Pig.

like image 44
nil Avatar answered Sep 23 '22 10:09

nil