Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying JSON fields in Redshift

We plan to start using Redshift soon, and one of our fields (columns) is a a JSON value. It's a flat JSON (i.e. by definition no nested levels) and the reason we must use JSON is that each record has different number of different key-value elements, which may very from 0 to 10 or more (so I can't use a field per pair or such).

For example, such field may be {"key1":"value1", "key2":"value2", ..., "key5":"value5"}

I would like to query and count all records having some specific key, and possibly group them by value. In the example above I would like something like "select count(*) where field has key 'key1' group by its value".

Does Redshift support querying by values within the JSON? How can such be achieved?

like image 566
user2339344 Avatar asked Oct 28 '14 15:10

user2339344


People also ask

Can Redshift handle JSON data?

Though Amazon Redshift supports JSON functions over CHAR and VARCHAR columns, we recommend using SUPER for processing data in JSON serialization format.

How does Redshift store JSON data?

Redshift does not have a native JSON data type like Snowflake or other data warehouse platforms, e.g. we can not load a JSON document into Redshift as a LOB. Each document must be parsed into a single column and can be manipulated and queried with the help of JSON-SQL functions offered in Redshift.

How do you pull data from Redshift?

The first method of extracting data from AWS Redshift through SQL involves transfers to Amazon S3 files, a part of Amazon web services. You can run the process by unloadingAWS data into S3 buckets and using SSIS (SQL Server Integration Services) for copying data into SQL servers.

What is Json_extract_path_text?

The JSON_EXTRACT_PATH_TEXT function returns the value for the key:value pair referenced by a series of path elements in a JSON string. The JSON path can be nested up to five levels deep.


2 Answers

Yes, Amazon Redshift supports parsing JSON string within a column with "JSON_EXTRACT_PATH_TEXT" function, and you can call this function even in where clause or group by clause. It's better to see the following example to understand how it works.

db=> create table json_test (id int primary key, json text);
db=> insert into json_test values (1, '{"key1":1, "key2":"a"}');
db=> insert into json_test values (2, '{"key1":2, "key2":"b"}');
db=> insert into json_test values (3, '{"key1":3, "key2":"a"}');
db=> insert into json_test values (4, '{"key3":0}');
db=> select * from json_test order by id;
 id |          json
----+------------------------
  1 | {"key1":1, "key2":"a"}
  2 | {"key1":2, "key2":"b"}
  3 | {"key1":3, "key2":"a"}
  4 | {"key3":0}
(4 rows)


-- In select list
db=> select json_extract_path_text(json, 'key2') as key2 from json_test where id = 1;
 key2
------
 a
(1 row)


-- Where clause
db=> select * from json_test where json_extract_path_text(json, 'key1') = 1;
 id |          json
----+------------------------
  1 | {"key1":1, "key2":"a"}
(1 row)


-- Group by
db=> select min(id) as min_id from json_test group by json_extract_path_text(json, 'key2') order by min_id;
 min_id
--------
      1
      2
      4
(3 rows)

See Redshift Dev Guide - JSON_EXTRACT_PATH_TEXT Function for the details of "JSON_EXTRACT_PATH_TEXT" function. Also you can see other JSON functions in Redshift Dev Guide - JSON Functions.

like image 144
Masashi M Avatar answered Sep 24 '22 10:09

Masashi M


Did you try using Redshift's JSON_EXTRACT_PATH_TEXT function?

like image 31
Pop Avatar answered Sep 22 '22 10:09

Pop