I have thousands of XML files which each have an array of key/value pairs. Like this:
<extras>
<extra>
<key>key_name_1</key>
<value>value_1</value>
</extra>
<extra>
<key>key_name_2</key>
<value>value_2</value>
</extra>
</extras>
I need to be able to create a query that produces a single-row response for each XML file containing each of those values in a separated column named by the key. With MySQL, I was able to use extractvalue with XPath ('extras/extra[key="key_name_1"/value') for this, but with Snowflake I am not able to find a solution for this.
I have tried lateral flatten and then picking up the value from THIS array, but I haven't succeeded. It is probably something simple, but I am not able to find the solution.
I could always use something like this:
XMLGET(XMLGET("xml_data", 'extras'):"$"[0], 'value'):"$"
Unfortunately, all keys aren't always available making this useless.
Can it be that I have to do a UDF for this?
Got it working following way:
WITH Extra_1 AS (
SELECT
"id" as "id",
XMLGET(extra.value, 'value'):"$" AS "value"
FROM table,
LATERAL FLATTEN(XMLGET("xml_data", 'extras'):"$") extra
WHERE XMLGET(extra.value, 'key'):"$" = 'key_name_1'
),
WITH Extra_2 AS (
SELECT
"id" AS "id",
XMLGET(extra.value, 'value'):"$" AS "value"
FROM table,
LATERAL FLATTEN(XMLGET("xml_data", 'extras'):"$") extra
WHERE XMLGET(extra.value, 'key'):"$" = 'key_name_2'
)
SELECT
"id",
Extra_1."value" AS "key_name_1",
Extra_2."value" AS "key_name_2",
FROM table
JOIN Extra_1 ON Extra_1."id" = table."id";
JOIN Extra_2 ON Extra_2."id" = table."id";
Hopefully, someone has a simpler way to do this.
This is how easily it can be done in MySQL:
SELECT
id,
extractvalue( xml_data,'/extras/extra[key="key_name_1"]/value') AS key_name_1,
extractvalue( xml_data,'/extras/extra[key="key_name_2"]/value') AS key_name_2
FROM table;
Using FLATTEN
and XMLGET
:
CREATE OR REPLACE TABLE t
AS
SELECT 1 AS id, PARSE_XML('<extras>
<extra>
<key>key_name_1</key>
<value>value_1</value>
</extra>
<extra>
<key>key_name_2</key>
<value>value_2</value>
</extra>
</extras>') AS col_xml
UNION ALL
SELECT 2 AS id, PARSE_XML('<extras>
<extra>
<key>key_name_3</key>
<value>value_3</value>
</extra>
<extra>
<key>key_name_4</key>
<value>value_4</value>
</extra>
<extra>
<key>key_name_5</key>
<value>value_5</value>
</extra>
</extras>') AS col;
Query:
SELECT id,
XMLGET(s.VALUE, 'key'):"$"::STRING AS k
,XMLGET(s.VALUE, 'value'):"$"::STRING AS val
FROM t
,LATERAL FLATTEN(INPUT => t.col_xml:"$") s
WHERE GET(s.value, '@')::STRING='extra';
Output:
Single row:
SELECT id
,GET(XMLGET(t.col_xml, 'extra', 0), '$')[0]:"$"::STRING AS key1
,GET(XMLGET(t.col_xml, 'extra', 0), '$')[1]:"$"::STRING AS value1
,GET(XMLGET(t.col_xml, 'extra', 1), '$')[0]:"$"::STRING AS key2
,GET(XMLGET(t.col_xml, 'extra', 1), '$')[1]:"$"::STRING AS value2
,GET(XMLGET(t.col_xml, 'extra', 2), '$')[0]:"$"::STRING AS key2
,GET(XMLGET(t.col_xml, 'extra', 2), '$')[1]:"$"::STRING AS value2
FROM t;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With