We plan to start using Redshift soon, and one of our fields (columns) is a a JSON value. It's a flat JSON (i.e. by definition no nested levels) and the reason we must use JSON is that each record has different number of different key-value elements, which may very from 0 to 10 or more (so I can't use a field per pair or such).
For example, such field may be {"key1":"value1", "key2":"value2", ..., "key5":"value5"}
I would like to query and count all records having some specific key, and possibly group them by value. In the example above I would like something like "select count(*) where field has key 'key1' group by its value".
Does Redshift support querying by values within the JSON? How can such be achieved?
Though Amazon Redshift supports JSON functions over CHAR and VARCHAR columns, we recommend using SUPER for processing data in JSON serialization format.
Redshift does not have a native JSON data type like Snowflake or other data warehouse platforms, e.g. we can not load a JSON document into Redshift as a LOB. Each document must be parsed into a single column and can be manipulated and queried with the help of JSON-SQL functions offered in Redshift.
The first method of extracting data from AWS Redshift through SQL involves transfers to Amazon S3 files, a part of Amazon web services. You can run the process by unloadingAWS data into S3 buckets and using SSIS (SQL Server Integration Services) for copying data into SQL servers.
The JSON_EXTRACT_PATH_TEXT function returns the value for the key:value pair referenced by a series of path elements in a JSON string. The JSON path can be nested up to five levels deep.
Yes, Amazon Redshift supports parsing JSON string within a column with "JSON_EXTRACT_PATH_TEXT" function, and you can call this function even in where clause or group by clause. It's better to see the following example to understand how it works.
db=> create table json_test (id int primary key, json text);
db=> insert into json_test values (1, '{"key1":1, "key2":"a"}');
db=> insert into json_test values (2, '{"key1":2, "key2":"b"}');
db=> insert into json_test values (3, '{"key1":3, "key2":"a"}');
db=> insert into json_test values (4, '{"key3":0}');
db=> select * from json_test order by id;
id | json
----+------------------------
1 | {"key1":1, "key2":"a"}
2 | {"key1":2, "key2":"b"}
3 | {"key1":3, "key2":"a"}
4 | {"key3":0}
(4 rows)
-- In select list
db=> select json_extract_path_text(json, 'key2') as key2 from json_test where id = 1;
key2
------
a
(1 row)
-- Where clause
db=> select * from json_test where json_extract_path_text(json, 'key1') = 1;
id | json
----+------------------------
1 | {"key1":1, "key2":"a"}
(1 row)
-- Group by
db=> select min(id) as min_id from json_test group by json_extract_path_text(json, 'key2') order by min_id;
min_id
--------
1
2
4
(3 rows)
See Redshift Dev Guide - JSON_EXTRACT_PATH_TEXT Function for the details of "JSON_EXTRACT_PATH_TEXT" function. Also you can see other JSON functions in Redshift Dev Guide - JSON Functions.
Did you try using Redshift's JSON_EXTRACT_PATH_TEXT function?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With