Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert JSON Array of Arrays to columns and rows

I'm pulling data from an API in JSON with a format like the example data below. Where essentially every "row" is an array of values. The API doc defines the columns and their types in advance. So I know the col1 is, for example, a varchar, and that col2 is an int.

CREATE TEMP TABLE dat (data json);
INSERT INTO dat
VALUES ('{"COLUMNS":["col1","col2"],"DATA":[["a","1"],["b","2"]]}');

I want to transform this within PostgreSQL 9.3 such that I end up with:

col1 | col2
------------
  a  |  1
  b  |  2

Using json_array_elements I can get to:

SELECT json_array_elements(data->'DATA') 
FROM dat

json_array_elements
json
---------
["a","1"]
["b","2"]

but then I can't figure out how to do either convert the JSON array to a PostgreSQL array so I can perform something like unnest(ARRAY['a','1'])

like image 352
raphael Avatar asked Aug 17 '15 13:08

raphael


People also ask

How do you transform an array in JSON?

You convert the whole array to JSON as one object by calling JSON. stringify() on the array, which results in a single JSON string. To convert back to an array from JSON, you'd call JSON. parse() on the string, leaving you with the original array.

What is difference in [] and {} in JSON?

{} denote containers, [] denote arrays.

Can you have an array of arrays in JSON?

3.1. We can store an array inside another JSON array. It is known as an array of arrays or a multi-dimensional JSON array.

Can a JSON file have multiple arrays?

JSON array can store multiple values. It can store string, number, boolean or object in JSON array. In JSON array, values must be separated by comma.


1 Answers

General case for unknown columns

To get a result like

col1 | col2
------------
  a  |  1
  b  |  2

will require a bunch of dynamic SQL, because you don't know the types of the columns in advance, nor the column names.

You can unpack the json with something like:

SELECT
  json_array_element_text(colnames, colno) AS colname,
  json_array_element_text(colvalues, colno) AS colvalue,
  rn,
  idx,
  colno
FROM (
  SELECT
    data -> 'COLUMNS' AS colnames,
    d AS colvalues,
    rn,
    row_number() OVER () AS idx
  FROM (
    SELECT data, row_number() OVER () AS rn FROM dat
  ) numbered
  cross join json_array_elements(numbered.data -> 'DATA') d
) elements
cross join generate_series(0, json_array_length(colnames) - 1) colno;

producing a result set like:

 colname | colvalue | rn | idx | colno 
---------+----------+----+-----+-------
 col1    | a        |  1 |   1 |     0
 col2    | 1        |  1 |   1 |     1
 col1    | b        |  1 |   2 |     0
 col2    | 2        |  1 |   2 |     1
(4 rows)

You can then use this as input to the crosstab function from the tablefunc module with something like:

SELECT * FROM crosstab('
SELECT
  to_char(rn,''00000000'')||''_''||to_char(idx,''00000000'') AS rowid,
  json_array_element_text(colnames, colno) AS colname,
  json_array_element_text(colvalues, colno) AS colvalue
FROM (
  SELECT
    data -> ''COLUMNS'' AS colnames,
    d AS colvalues,
    rn,
    row_number() OVER () AS idx
  FROM (
    SELECT data, row_number() OVER () AS rn FROM dat
  ) numbered
  cross join json_array_elements(numbered.data -> ''DATA'') d
) elements
cross join generate_series(0, json_array_length(colnames) - 1) colno;
') results(rowid text, col1 text, col2 text);

producing:

        rowid        | col1 | col2 
---------------------+------+------
  00000001_ 00000001 | a    | 1
  00000001_ 00000002 | b    | 2
(2 rows)

The column names are not retained here.

If you were on 9.4 you could avoid the row_number() calls and use WITH ORDINALITY, making it much cleaner.

Simplified with fixed, known columns

Since you apparently know the number of columns and their types in advance the query can be considerably simplified.

SELECT
  col1, col2
FROM (
  SELECT
    rn,
    row_number() OVER () AS idx,
    elem ->> 0 AS col1,
    elem ->> 1 :: integer AS col2
  FROM (
    SELECT data, row_number() OVER () AS rn FROM dat
  ) numbered
  cross join json_array_elements(numbered.data -> 'DATA') elem
  ORDER BY 1, 2
) x;

result:

 col1 | col2 
------+------
 a    |    1
 b    |    2
(2 rows)

Using 9.4 WITH ORDINALITY

If you were using 9.4 you could keep it cleaner using WITH ORDINALITY:

SELECT
  col1, col2
FROM (
  SELECT
    elem ->> 0 AS col1,
    elem ->> 1 :: integer AS col2
  FROM
    dat
  CROSS JOIN
    json_array_elements(dat.data -> 'DATA') WITH ORDINALITY AS elements(elem, idx)
  ORDER BY idx
) x;
like image 139
Craig Ringer Avatar answered Sep 23 '22 01:09

Craig Ringer