Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I parse JSON in Pig?

I have a lot of gzip'd log files in s3 that has 3 types of log lines: b,c,i. i and c are both single level json:

{"this":"that","test":"4"}

Type b is deeply nested json. I came across this gist talking about compiling a jar to make this work. Since my java skills are less than stellar, I didn't really know what to do from here.

{"this":{"foo":"bar","baz":{"test":"me"},"total":"5"}}

Since types i and c are not always in the same order, this makes specifying everything in the generate regex difficult. Is handling JSON (in a gzip'd file) possible with Pig? I am using whichever version of Pig comes built on an Amazon Elastic Map Reduce instance.

This boils down to two questions: 1) Can I parse JSON with Pig (and if so, how)? 2) If I can parse JSON (from a gzip'd logfile), can I parse nested JSON objects?

like image 311
Eric Lubow Avatar asked Feb 16 '11 05:02

Eric Lubow


People also ask

How do I load a JSON file into a pig?

1 Answer. Show activity on this post. jsoncust_table = LOAD 'customers. json' USING JsonLoader('age:int, name:chararray, messages:chararray');

How do you parse JSON?

parse() JSON parsing is the process of converting a JSON object in text format to a Javascript object that can be used inside a program. In Javascript, the standard way to do this is by using the method JSON. parse() , as the Javascript standard specifies.

How do I parse a JSON in Python?

If you need to parse a JSON string that returns a dictionary, then you can use the json. loads() method. If you need to parse a JSON file that returns a dictionary, then you can use the json. load() method.

What is parse function in JSON?

parse() The JSON. parse() method parses a JSON string, constructing the JavaScript value or object described by the string. An optional reviver function can be provided to perform a transformation on the resulting object before it is returned.


2 Answers

Pig 0.10 comes with builtin JsonStorage and JsonLoader().

pig doc for json load/store

like image 66
Thejas Nair Avatar answered Sep 24 '22 20:09

Thejas Nair


After a lot of workarounds and working through things, I was able to answer to get this done. I did a write-up about it on my blog about how to do this. It is available here: http://eric.lubow.org/2011/hadoop/pig-queries-parsing-json-on-amazons-elastic-map-reduce-using-s3-data/

like image 29
Eric Lubow Avatar answered Sep 20 '22 20:09

Eric Lubow