Querying JSON fields in Redshift

Tags:

amazon-redshift

We plan to start using Redshift soon, and one of our fields (columns) is a a JSON value. It's a flat JSON (i.e. by definition no nested levels) and the reason we must use JSON is that each record has different number of different key-value elements, which may very from 0 to 10 or more (so I can't use a field per pair or such).

For example, such field may be {"key1":"value1", "key2":"value2", ..., "key5":"value5"}

I would like to query and count all records having some specific key, and possibly group them by value. In the example above I would like something like "select count(*) where field has key 'key1' group by its value".

Does Redshift support querying by values within the JSON? How can such be achieved?

566

asked Oct 28 '14 15:10

user2339344

2 Answers

Yes, Amazon Redshift supports parsing JSON string within a column with "JSON_EXTRACT_PATH_TEXT" function, and you can call this function even in where clause or group by clause. It's better to see the following example to understand how it works.

db=> create table json_test (id int primary key, json text);
db=> insert into json_test values (1, '{"key1":1, "key2":"a"}');
db=> insert into json_test values (2, '{"key1":2, "key2":"b"}');
db=> insert into json_test values (3, '{"key1":3, "key2":"a"}');
db=> insert into json_test values (4, '{"key3":0}');
db=> select * from json_test order by id;
 id |          json
----+------------------------
  1 | {"key1":1, "key2":"a"}
  2 | {"key1":2, "key2":"b"}
  3 | {"key1":3, "key2":"a"}
  4 | {"key3":0}
(4 rows)


-- In select list
db=> select json_extract_path_text(json, 'key2') as key2 from json_test where id = 1;
 key2
------
 a
(1 row)


-- Where clause
db=> select * from json_test where json_extract_path_text(json, 'key1') = 1;
 id |          json
----+------------------------
  1 | {"key1":1, "key2":"a"}
(1 row)


-- Group by
db=> select min(id) as min_id from json_test group by json_extract_path_text(json, 'key2') order by min_id;
 min_id
--------
      1
      2
      4
(3 rows)

See Redshift Dev Guide - JSON_EXTRACT_PATH_TEXT Function for the details of "JSON_EXTRACT_PATH_TEXT" function. Also you can see other JSON functions in Redshift Dev Guide - JSON Functions.

144

answered Sep 24 '22 10:09

Masashi M

Did you try using Redshift's JSON_EXTRACT_PATH_TEXT function?

answered Sep 22 '22 10:09

Pop

Related questions
                            
                                How to save Amazon Redshift output to local CSV through SQL Workbench?
                            
                                Redshift/Postgres: how can I ignore rows that generate errors? (Invalid JSON in json_extract_path_text)
                            
                                Offloading data files from Amazon Redshift to Amazon S3 in Parquet format
                            
                                Connect to Redshift Database from Laravel 5 using Pgsql Driver?
                            
                                cannot copy json - Dynamo db Streams to redshift
                            
                                Using RedShift CURSOR to insert and iterate
                            
                                Connect to Redshift via SSL using R
                            
                                Bulk updating existing rows in Redshift
                            
                                How to calculate median in AWS Redshift?
                            
                                How can I build a front end for querying a Redshift database (hopefully with Rails)
                            
                                How to connect to a cluster in Amazon Redshift using SQLAlchemy?
                            
                                Amazon Redshift how to get the last date a table inserted data
                            
                                How to get full length of DDL for a table or any object in redshift / postgresql
                            
                                Redshift and Postgres JDBC driver both intercept jdbc://postgresql connection string
                            
                                Amazon Redshift at 100% disk usage due to VACUUM query
                            
                                How to alter redshift column encoding in place?
                            
                                Amazon Redshift - COPY from CSV - single Double Quote in row - Invalid quote formatting for CSV Error
                            
                                REST API for Redshift
                            
                                Does Google BigQuery/ Amazon Redshift use column-based relational database or NoSQL database?
                            
                                NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:redshift.psycopg2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With