Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access array internal index with postgreSQL?

This is my (perhaps usual for you) non-optimized solution:

Workaround for PG problem with non-optimized internal function:

CREATE FUNCTION unnest_with_idx(anyarray)
RETURNS TABLE(idx integer, val anyelement) AS
$$ 
   SELECT generate_series(1,array_upper($1,1)) as idx, unnest($1) as val;
$$ LANGUAGE SQL IMMUTABLE;

Test:

SELECT idx,val from unnest_with_idx(array[1,20,3,5]) as t;

But, as I said, non-optimized. I can't believe (!!) that PostgreSQL doesn't have an internal index for arrays ... ? But in this case, the question is how to directly access this index, where the GIN-like internal counter?

NOTE1: the solution above and the question is not the same as "how do you create an index by each element of an array?". Also not the same as "Can PostgreSQL index array columns?" because the function is for an isolated array, not for a table index for array fields.


NOTE2 (edited after answers): "array indexes" (more popular term) or "array subscripts" or "array counter" are terms that we can use in a semantic path to refer the "internal counter", the accumulator to the next array item. I see that no PostgreSQL command offer a direct access to this counter. As generate_series() function, the generate_subscripts() function is a sequence generator, and the performance is (best but) near the same. By other hand row_number() function offers a direct access to a "internal counter of rows", but it is about rows, not about arrays, and unfortunately the performance is worse.

like image 975
Peter Krauss Avatar asked Sep 03 '12 11:09

Peter Krauss


People also ask

How do I index an array in PostgreSQL?

To index arrays in PostgreSQL, it's best to use a GIN or GiST index. Using either index has its benefits and drawbacks; however, GiST indexes were primarily developed for geometric datatypes, while GIN indexes were designed for arrays.

Does PostgreSQL support array?

PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.

How do I find the index of a table in PostgreSQL?

If you use psql to access the PostgreSQL database, you can use the \d command to view the index information for a table.


1 Answers

Postgres 9.4 or later

While operating with 1-dimensional arrays and standard index subscripts (like almost always), use the new WITH ORDINALITY instead:

SELECT t.*
FROM   unnest(ARRAY[1,20,3,5]) WITH ORDINALITY t(val, idx);

See:

  • PostgreSQL unnest() with element number

Just make sure you don't trip over non-standard subscripts. See:

  • Normalize array subscripts so they start with 1

Postgres 9.3 or earlier

(Original answer.)

Postgres does provide dedicated functions to generate array subscripts:

WITH   x(a) AS (VALUES ('{1,20,3,5}'::int[]))
SELECT generate_subscripts(a, 1) AS idx
     , unnest(a) AS val
FROM   x;

Effectively it does almost the same as @Frank's query, just without subquery.
Plus, it also works for subscripts that do not start with 1.

Either solution works for 1-dimensional arrays only! (Can easily be expanded to multiple dimensions.)

Function:

CREATE OR REPLACE FUNCTION unnest_with_idx(anyarray) 
  RETURNS TABLE(idx integer, val anyelement)
  LANGUAGE sql IMMUTABLE AS
$func$
  SELECT generate_subscripts($1, 1), unnest($1);
$func$;

Call:

SELECT * FROM unnest_with_idx('{1,20,3,5}'::int[]);

Also consider:

SELECT * FROM unnest_with_idx('[4:7]={1,20,3,5}'::int[]);

About custom array subscripts:

  • Normalize array subscripts so they start with 1

To get normalized subscripts starting with 1 for a 1-dimensional array:

SELECT generate_series(1, array_length($1,1)) ...

That's almost the query you had already, just with array_length() instead of array_upper() - which would fail with non-standard subscripts.

Performance

I ran a quick test on an array of 1000 int with all queries presented here so far. They all perform about the same (~ 3,5 ms) - except for row_number() on a subquery (~ 7,5 ms) - as expected, because of the subquery.

like image 196
Erwin Brandstetter Avatar answered Sep 29 '22 19:09

Erwin Brandstetter