Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract selected values from json string in Hive

I am running a simple query in Hive that produces the following output (with a few other additional columns.

|------|-----------------------------------------------------------|
| col1 | col2                                                      |
|------|-----------------------------------------------------------|
|   A  | {"variable1":123,"variable2":456,"variable3":789}         |                                          
|------|-----------------------------------------------------------|
|   B  | {"variable1":222,"variable2":333,"variable3":444}         |
--------------------------------------------------------------------

I need to be able to parse the json string and pull out the values for each token during the SELECT statement itself so that I can perhaps incorporate a WHERE statement to return only the parts of the string that are valuable to me.

So my ultimate output might look like this:

|------------------------------------------|
| col1 |variable1 | variable2 | variable3  |                                      
|------------------------------------------|
|  A   |   123    |    456    |    789     |                                    
|------------------------------------------|
|  B   |   222    |    333    |    444     |
--------------------------------------------
 

I have tried using various functions in including SPLIT and GET_JSON_OBJECT using the argument structure specified in the examples yet all return errors such as:

No matching method for class org.apache.hadoop.hive.ql.udf.UDFJson 
with (struct<...>, string). Possible choices: _FUNC_(string, string)

Could someone please tell if what I am trying to do is feasible, or explain where I am going wrong?

Thanks in advance

like image 909
jimiclapton Avatar asked Aug 04 '17 20:08

jimiclapton


People also ask

Does Hive support JSON data type?

Hive provides three different mechanisms to run queries on JSON documents, or you can write your own: Use the get_json_object user-defined function (UDF). Use the json_tuple UDF. Use the custom Serializer/Deserializer (SerDe).


1 Answers

select col1, get_json_object(col2,'$.variable1') as variable1,
get_json_object(col2,'$.variable2') as variable2,
get_json_object(col2,'$.variable3') as variable3 
from json_test

If you put your output into a table (say json_test), you can parse in this way. You can tweak your query too to obtain these results.

Output:

col1 |variable1 |variable2 |variable3 |
-----|----------|----------|----------|
A    |123       |456       |789       |
B    |222       |333       |444       |
like image 174
Prabhat Ratnala Avatar answered Oct 07 '22 10:10

Prabhat Ratnala