Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing JSON list in Snowflake - converting redshift sql to snowflake sql

I have some Redshift SQL that I'm trying to convert to snowflake SQL to extract values from a json field. The issue I'm running into is the specification of the index required.

Because I run A/B/n tests, there can be multiple indexes up to 'n'.

So I had this piece of SQL working for Redshift:

SELECT JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (e.splits,n.n),'split_type') types
     , JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (e.splits,n.n),'weight') as weight
FROM experiments e, (SELECT  (p0.n + p1.n*2 + p2.n * POWER(2,2) + p3.n * POWER(2,3) + p4.n * POWER(2,4) + p5.n * POWER(2,5)
                              + p6.n * POWER(2,6) + p7.n * POWER(2,7) + p8.n * POWER(2,8) + p9.n * POWER(2,9))::int as n
                     FROM
                         (SELECT 0 as n UNION SELECT 1) p0,
                         (SELECT 0 as n UNION SELECT 1) p1,
                         (SELECT 0 as n UNION SELECT 1) p2,
                         (SELECT 0 as n UNION SELECT 1) p3,
                         (SELECT 0 as n UNION SELECT 1) p4,
                         (SELECT 0 as n UNION SELECT 1) p5,
                         (SELECT 0 as n UNION SELECT 1) p6,
                         (SELECT 0 as n UNION SELECT 1) p7,
                         (SELECT 0 as n UNION SELECT 1) p8,
                         (SELECT 0 as n UNION SELECT 1) p9
                     Order by 1
                     ) n
WHERE types <> ''
AND weight <> ''

From reading snowlfake's documentation, it would seem like the following should work:

SELECT parse_json(parse_json(e.splits)[n.n]):split_type as types,
       parse_json(parse_json(e.splits)[n.n]):weight as weight
FROM experiments e, (SELECT (p0.n ...

However I get the error "SQL compilation error: error line 1 at position 39 invalid identifier 'N.N'"

I'm wondering if someone would be able to help with this issue?

EDIT:

experiments table looks like:

exp_ID | splits
1  | [{"id":203,"weight":50,"split_type":"a"},{"id":204,"weight":50,"split_type":"control"}]
2  | [{"id":205,"weight":33.33,"split_type":"a"},{"id":206,"weight":33.33,"split_type":"b"},{"id":207,"weight":33.33,"split_type":"c"}]
3  | [{"id":208,"weight":25,"split_type":"a"},{"id":209,"weight":25,"split_type":"b"},{"id":210,"weight":25,"split_type":"c"},{"id":211,"weight":25,"split_type":"d"}]

required output:

exp_ID | ID  | types   | weight
1      | 203 | a       | 50
1      | 204 | control | 50
2      | 205 | a       | 33.33
2      | 206 | b       | 33.33
2      | 207 | c       | 33.33
3      | 208 | a       | 25
3      | 209 | b       | 25
3      | 210 | c       | 25
3      | 211 | d       | 25
like image 509
Charlie B Avatar asked Oct 15 '21 09:10

Charlie B


People also ask

What does parse JSON do in Snowflake?

Snowflake parses the JSON into sub-columns based on the key-value pairs at load time. These keys are recorded as pointers in metadata for optimization. Structural information is dynamically derived based on the schema definition embedded in the JSON string (Schema-on-Read).

Which function helps the JSON parser to remove outer brackets in Snowflake?

Using the FLATTEN Function to Parse JSON Arrays FLATTEN removes one level of nesting.


2 Answers

With a table defined as

create temp table EXPERIMENTS(EMP_ID int, SPLITS variant);

You can insert rows like this (This is just for testing. Do not use single-row inserts for production pipelines):

insert into experiments select 1, parse_json('[{"id":203,"weight":50,"split_type":"a"},{"id":204,"weight":50,"split_type":"control"}]');
insert into experiments select 2, parse_json('[{"id":205,"weight":33.33,"split_type":"a"},{"id":206,"weight":33.33,"split_type":"b"},{"id":207,"weight":33.33,"split_type":"c"}]');
insert into experiments select 3, parse_json('[{"id":208,"weight":25,"split_type":"a"},{"id":209,"weight":25,"split_type":"b"},{"id":210,"weight":25,"split_type":"c"},{"id":211,"weight":25,"split_type":"d"}]');

With it stored in the table that way, you can query the JSON in columns like this:

select   EXP_ID
        ,VALUE:id                 as ID
        ,VALUE:split_type::string as TYPES
        ,VALUE:weight             as WEIGHT
from experiments
     ,lateral flatten(splits)
like image 79
Greg Pavlik Avatar answered Oct 22 '22 17:10

Greg Pavlik


The article below is to demonstrate various examples of using LATERAL FLATTEN to extract information from a JSON Document. Examples are provided for its utilization together with GET_PATH, UNPIVOT, and SEQ functions.

https://community.snowflake.com/s/article/Dynamically-extracting-JSON-using-LATERAL-FLATTEN

like image 2
FKayani Avatar answered Oct 22 '22 17:10

FKayani