Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Array of arrays in PostgreSQL

I'm using the %% operator on PostgreSQL's hstore type which converts a hstore (key-value type effectively) into an array whose elements alternate {{key, value}, {key value}}.

When I want to return array of these flattened hstores I get this error: could not find array type for data type text[]due to PostgreSQL lack of support for an array of arrays.

From a curiosity standpoint, does anyone know why these are not supported? And more importantly, is there a work around for this type of scenario?

At the moment I'm concatenating the results into a string (comma separated) and parsing them on the application (C# and NPGSQL) side. However, this approach doesn't feel quite right, I'd like to be able to read the row back as a .NET array of arrays or array of key-values etc.

Many thanks.

like image 500
harman_kardon Avatar asked Feb 06 '12 11:02

harman_kardon


People also ask

What is [] in PostgreSQL?

Array Type. PostgreSQL gives the opportunity to define a column of a table as a variable length single or multidimensional array. Arrays of any built-in or user-defined base type, enum type, or composite type can be created. We will focus on one data type in particular, the Array of text, text[].

Can you store arrays in PostgreSQL?

PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.

Should you use arrays in Postgres?

When you are considering portability (e.g. rewriting your system to work with other databses) then you must not use arrays. If you are sure you'll stick with Postgres, then you can safely use arrays where you find appropriate. They exist for a reason and are neither bad design nor non-compliant.

What is Unnest in PostgreSQL?

Unnest function generates a table structure of an array in PostgreSQL. Unnest array function is beneficial in PostgreSQL for expanding the array into the set of values or converting the array into the structure of the rows. PostgreSQL offers unnest() function.


2 Answers

PostgreSQL has limited "array of arrays" support

see manual

It is a restricted form of "array of arrays". As Pavel (answer) says, it is named "multidimensional array" but is really a matrix, so it must have the same number of elements in each dimension.

You can use this kind of structure for map multidimensional and heterogeneous cartesian coordinates in scientific applications, but not to store arbitrary vectors of vectors like a XML or JSON data.

NOTE: a well-known 2-dimensional (2D) homogeneous array is the mathematical matrix. In fact, the scientific applications of matrix that motivated the "PostgreSQL constrained multidimensional array" datatype, and the array functions behaviour with these kind of arrays. Think about "3D array" as a "3D matrix", "4D array" as a "4D matrix", and so on.

EXAMPLES:

SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
---------------------
 {{1,2},{3,4},{5,6}}
SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[[5,6]]); -- SAME RESULT

SELECT ARRAY[ARRAY[1,2],ARRAY[5,6]];
---------------
 {{1,2},{5,6}}

SELECT array_cat(ARRAY[ARRAY[1,2]],ARRAY[3]); -- ERROR1
SELECT ARRAY[ARRAY[1,2],ARRAY[4]];  -- ERROR2 

The comments of @Daniel_Lyons about "why these are not supported" is about "non-uniform arrays of arrays" (see error cases above). ERROR1 above: because can only concatenate arrays of same dimension ERROR2 above: all arrays for a specific dimension must have the same length, like a matrix.

Another curious thing about build-in functions and operators: the "default behaviour" in PostgreSQL is for single arrays and elements. There are no overload for standard array_append(),

SELECT array_append(ARRAY[1,2],5); -- now ok, 5 is a element
 {1,2,5}

SELECT array_cat(ARRAY[1,2], ARRAY[5,6]);
----------
 {1,2,5,6}

SELECT array_append(ARRAY[[1,2],[3,4]], ARRAY[5,6]); -- ERROR3 
SELECT array_append(ARRAY[1,2],ARRAY[5,6]); -- ERROR4

ERROR3 above: there are NO OVERLOAD to append "array element" (even 9.2 pg version). ERROR4 above: must use array_cat to "merge all in one array".

The "merge behaviour" of the last array_cat example is curious, not produced array of arrays. Use array_cat(a1, ARRAY[a2]) for achieve this result,

SELECT array_cat(ARRAY[1,2], ARRAY[ARRAY[5,6]]);  -- seems illogical...
---------------
{{1,2},{5,6}}

Sparse matrix

To avoid problems with sparse matrix and similar data structures, use the function below. It fills the remaining elements, setting then to NULL (or to any constant value).

 CREATE or replace FUNCTION array_fillTo(
    p_array anyarray, p_len integer, p_null anyelement DEFAULT NULL
 ) RETURNS anyarray AS $f$
   SELECT CASE 
       WHEN len=0 THEN array_fill(p_null,array[p_len])
       WHEN len<p_len THEN p_array || array_fill($3,array[$2-len])
       ELSE $1 END
   FROM ( SELECT COALESCE( array_length(p_array,1), 0) ) t(len)
 $f$ LANGUAGE SQL IMMUTABLE;

PS: please edit this answer to add any corrections/optimizations, it is a Wiki!

Returning to the first examples, now we can avoid errors (see ERROR1),

SELECT array_cat(ARRAY[ARRAY[1,2]],array_fillTo(ARRAY[3],2));
-- {{1,2},{3,NULL}}
SELECT array_cat(
   ARRAY[ARRAY[1.1::float,2.0]],
   array_fillTo(ARRAY[]::float[],2,0::float)
);
-- {{1.1,2},{0,0}}
SELECT array_fillto(array['Hello'],2,'');
-- {Hello,""}

NOTE about old array_fillTo()

The array_fill() become a buildin function with PostgreSQL v8.4, for v8.3 or olds:

 CREATE FUNCTION array_fillTo(anyarray,integer,anyelement DEFAULT NULL) 
 RETURNS anyarray AS $$
   DECLARE
     i integer;
     len integer;
     ret ALIAS FOR $0;
   BEGIN
     len = array_length($1,1);
     ret = $1;
     IF len<$2 THEN
         FOR i IN 1..($2-len) LOOP
           ret = ret || $3;
         END LOOP;
     END IF;
     RETURN ret;
   END;
 $$ LANGUAGE plpgsql IMMUTABLE;
like image 109
7 revs, 2 users 95% Avatar answered Oct 01 '22 05:10

7 revs, 2 users 95%


From a curiosity standpoint, does anyone know why these are not supported?

One generic answer is because arrays are intrinsically anti-relational. Removing repeating values is how you achieve 1st normal form. To have repeating groups of repeating groups seems quite insane from a relational theoretical standpoint.

In general, the relationally-correct thing to do is to extract a table for your repeating values. So if you modeled something like this:

CREATE TABLE users (
  id integer primary key,
  name varchar,
  favorite_colors varchar[],
  ...
);

it would behoove you to redefine this relationally like so:

CREATE TABLE users (
  id integer primary key,
  name varchar,
  ...
);

CREATE TABLE favorite_colors (
  user_id integer references users,
  color varchar
);

Or even:

CREATE TABLE users (
  id integer primary key,
  name varchar,
  ...
);

CREATE TABLE colors (
  color varchar primary key
);

CREATE TABLE favorite_colors (
  user_id integer references users,
  color varchar references colors,
  primary key (user_id, color)
);

Hstore supports a lot of functions, many of which would make it easy to integrate it into a relational worldview. I think the simplest way to solve your problem would be to use the each function to convert your hstore values into relations you can then use like a normal set of values. This is how you address having multiple values in other databases anyway: querying, and working with result sets.

like image 42
Daniel Lyons Avatar answered Oct 01 '22 05:10

Daniel Lyons