I want to create a Hive table out of some JSON data (nested) and run queries on it? Is this even possible? I've gotten as far as uploading the JSON file to S3 and launching an EMR instance but I don't know what to type in the hive console to get the JSON file to be a Hive table? Does anyone have some example command to get me started, I can't find anything useful with Google ...

You'll need to use a JSON serde in order for Hive to map your JSON to the columns in your table. A really good example showing you how is here: http://aws.amazon.com/articles/2855 Unfortunately the JSON serde supplied doesn't handle nested JSON very well so you might need to flatten your JSON in order to use it. Here's an example of the correct syntax from the article: <pre class="prettyprint"><code>create external table impressions ( requestBeginTime string, requestEndTime string, hostname string ) partitioned by ( dt string ) row format serde 'com.amazon.elasticmapreduce.JsonSerde' with serdeproperties ( 'paths'='requestBeginTime, requestEndTime, hostname' ) location 's3://my.bucket/' ; </code></pre>

How do you make a HIVE table out of JSON data?

2 Answers

It's actually not necessary to use the JSON SerDe. There is a great blog post here (I'm not affiliated with the author in any way):

http://pkghosh.wordpress.com/2012/05/06/hive-plays-well-with-json/

Which outlines a strategy using the builtin-function json_tuple to parse the json at time of query (NOT at the time of table definition):

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-json_tuple

So basically, your table schema is simply to load each line as a single 'string' column and then extract the relevant json fields as needed on a per query basis. e.g. this query from that blog post:

SELECT b.blogID, c.email FROM comments a LATERAL VIEW json_tuple(a.value, 'blogID', 'contact') b  AS blogID, contact  LATERAL VIEW json_tuple(b.contact, 'email', 'website') c  AS email, website WHERE b.blogID='64FY4D0B28';

In my humble experience, this has proven more reliable (I encountered various cryptic issues dealing with the JSON serdes, especially with nested objects).

answered Oct 02 '22 18:10

Mike Repass

You'll need to use a JSON serde in order for Hive to map your JSON to the columns in your table.

A really good example showing you how is here:

http://aws.amazon.com/articles/2855

Unfortunately the JSON serde supplied doesn't handle nested JSON very well so you might need to flatten your JSON in order to use it.

Here's an example of the correct syntax from the article:

create external table impressions (     requestBeginTime string, requestEndTime string, hostname string   )   partitioned by (     dt string   )   row format      serde 'com.amazon.elasticmapreduce.JsonSerde'     with serdeproperties (        'paths'='requestBeginTime, requestEndTime, hostname'     )   location 's3://my.bucket/' ;

answered Oct 02 '22 19:10

seedhead

Related questions
                            
                                IE9 JSON Data "do you want to open or save this file"
                            
                                Cocoa error 3840 using JSON (iOS)
                            
                                Java Serialization vs JSON vs XML
                            
                                How to update a property of a JSON object using NewtonSoft
                            
                                What is a JSON octet and why are two required?
                            
                                Asserting JsonResult Containing Anonymous Type
                            
                                Returning an Eloquent model as JSON in Laravel 4
                            
                                How should HATEOAS-style links be implemented for RESTful JSON collections?
                            
                                How to parse a JSON and turn its values into an Array?
                            
                                Deserialize a JSON array in C#
                            
                                render :json => 'string here' expected result
                            
                                Concat numbers from JSON without doublequotes using jq [duplicate]
                            
                                AWS: how to fix S3 event replacing space with '+' sign in object key names in json
                            
                                Can JavaScriptSerializer exclude properties with null/default values?
                            
                                Using JSON with LogStash
                            
                                Dictionary-like JSON schema
                            
                                Why does the ASP.Net MVC model binder bind an empty JSON array to null?
                            
                                Generate pretty (indented) JSON with serde
                            
                                JSON.Net Ignore Property during deserialization
                            
                                ASP.NET MVC 2 - Failed with jquery ajax response

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do you make a HIVE table out of JSON data?

Tags:

json

hadoop

hive

emr

amazon-emr

nickponline

People also ask

2 Answers

Mike Repass

seedhead

Recent Activity

Donate For Us