Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flatten XML array to a single row in Snowflake

I have thousands of XML files which each have an array of key/value pairs. Like this:

<extras>
  <extra>
    <key>key_name_1</key>
    <value>value_1</value>
  </extra>
  <extra>
    <key>key_name_2</key>
    <value>value_2</value>
  </extra>
</extras>

I need to be able to create a query that produces a single-row response for each XML file containing each of those values in a separated column named by the key. With MySQL, I was able to use extractvalue with XPath ('extras/extra[key="key_name_1"/value') for this, but with Snowflake I am not able to find a solution for this.

I have tried lateral flatten and then picking up the value from THIS array, but I haven't succeeded. It is probably something simple, but I am not able to find the solution.

---Update---

I could always use something like this:

XMLGET(XMLGET("xml_data", 'extras'):"$"[0], 'value'):"$"

Unfortunately, all keys aren't always available making this useless.

Can it be that I have to do a UDF for this?

--- Update 2 ---

Got it working following way:

WITH Extra_1 AS (
    SELECT
        "id" as "id",
        XMLGET(extra.value, 'value'):"$" AS "value"
    FROM table,
        LATERAL FLATTEN(XMLGET("xml_data", 'extras'):"$") extra
    WHERE XMLGET(extra.value, 'key'):"$" = 'key_name_1'
),
WITH Extra_2 AS (
    SELECT
        "id" AS "id",
        XMLGET(extra.value, 'value'):"$" AS "value"
    FROM table,
        LATERAL FLATTEN(XMLGET("xml_data", 'extras'):"$") extra
    WHERE XMLGET(extra.value, 'key'):"$" = 'key_name_2'
)
SELECT
    "id",
    Extra_1."value" AS "key_name_1",
    Extra_2."value" AS "key_name_2",
FROM table
    JOIN Extra_1 ON Extra_1."id" = table."id";
    JOIN Extra_2 ON Extra_2."id" = table."id";

Hopefully, someone has a simpler way to do this.

This is how easily it can be done in MySQL:

SELECT
    id,
    extractvalue( xml_data,'/extras/extra[key="key_name_1"]/value') AS key_name_1,
    extractvalue( xml_data,'/extras/extra[key="key_name_2"]/value') AS key_name_2
FROM table;
like image 560
Tero Reunanen Avatar asked Sep 15 '25 06:09

Tero Reunanen


1 Answers

Using FLATTEN and XMLGET:

CREATE OR REPLACE TABLE t
AS
SELECT 1 AS id, PARSE_XML('<extras>
  <extra>
    <key>key_name_1</key>
    <value>value_1</value>
  </extra>
  <extra>
    <key>key_name_2</key>
    <value>value_2</value>
  </extra>
</extras>') AS col_xml
UNION ALL
SELECT 2 AS id, PARSE_XML('<extras>
  <extra>
    <key>key_name_3</key>
    <value>value_3</value>
  </extra>
  <extra>
    <key>key_name_4</key>
    <value>value_4</value>
  </extra>
 <extra>
    <key>key_name_5</key>
    <value>value_5</value>
  </extra>
</extras>') AS col;

Query:

SELECT id,  
    XMLGET(s.VALUE,  'key'):"$"::STRING AS k
   ,XMLGET(s.VALUE,  'value'):"$"::STRING AS val
FROM t
,LATERAL FLATTEN(INPUT => t.col_xml:"$") s
WHERE GET(s.value, '@')::STRING='extra';

Output:

enter image description here


Single row:

SELECT id 
    ,GET(XMLGET(t.col_xml,  'extra', 0), '$')[0]:"$"::STRING AS key1
    ,GET(XMLGET(t.col_xml,  'extra', 0), '$')[1]:"$"::STRING AS value1
    ,GET(XMLGET(t.col_xml,  'extra', 1), '$')[0]:"$"::STRING AS key2
    ,GET(XMLGET(t.col_xml,  'extra', 1), '$')[1]:"$"::STRING AS value2
    ,GET(XMLGET(t.col_xml,  'extra', 2), '$')[0]:"$"::STRING AS key2
    ,GET(XMLGET(t.col_xml,  'extra', 2), '$')[1]:"$"::STRING AS value2
FROM t;

enter image description here

like image 192
Lukasz Szozda Avatar answered Sep 17 '25 20:09

Lukasz Szozda