Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

eliminate duplicate array values in postgres

Tags:

postgresql

I have an array of type bigint, how can I remove the duplicate values in that array?

Ex: array[1234, 5343, 6353, 1234, 1234]

I should get array[1234, 5343, 6353, ...]

I tested out the example SELECT uniq(sort('{1,2,3,2,1}'::int[])) in the postgres manual but it is not working.

like image 821
GVK Avatar asked Oct 22 '10 06:10

GVK


4 Answers

I faced the same. But an array in my case is created via array_agg function. And fortunately it allows to aggregate DISTINCT values, like:

  array_agg(DISTINCT value)

This works for me.

like image 159
Mikhail Lisakov Avatar answered Nov 20 '22 17:11

Mikhail Lisakov


The sort(int[]) and uniq(int[]) functions are provided by the intarray contrib module.

To enable its use, you must install the module.

If you don't want to use the intarray contrib module, or if you have to remove duplicates from arrays of different type, you have two other ways.

If you have at least PostgreSQL 8.4 you could take advantage of unnest(anyarray) function

SELECT ARRAY(SELECT DISTINCT UNNEST('{1,2,3,2,1}'::int[]) ORDER BY 1);
 ?column? 
----------
 {1,2,3}
(1 row)

Alternatively you could create your own function to do this

CREATE OR REPLACE FUNCTION array_sort_unique (ANYARRAY) RETURNS ANYARRAY
LANGUAGE SQL
AS $body$
  SELECT ARRAY(
    SELECT DISTINCT $1[s.i]
    FROM generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
    ORDER BY 1
  );
$body$;

Here is a sample invocation:

SELECT array_sort_unique('{1,2,3,2,1}'::int[]);
 array_sort_unique 
-------------------
 {1,2,3}
(1 row)
like image 29
mnencia Avatar answered Nov 20 '22 16:11

mnencia


... Where the statandard libraries (?) for this kind of array_X utility??

Try to search... See some but no standard:

  • postgres.cz/wiki/Array_based_functions: good reference!

  • JDBurnZ/postgresql-anyarray, good initiative but needs some collaboration to enhance.

  • wiki.postgresql.org/Snippets, frustrated initiative, but "offcial wiki", needs some collaboration to enhance.

  • MADlib: good! .... but it is an elephant, not an "pure SQL snippets lib".


Simplest and faster array_distinct() snippet-lib function

Here the simplest and perhaps faster implementation for array_unique() or array_distinct():

CREATE FUNCTION array_distinct(anyarray) RETURNS anyarray AS $f$
  SELECT array_agg(DISTINCT x) FROM unnest($1) t(x);
$f$ LANGUAGE SQL IMMUTABLE;

NOTE: it works as expected with any datatype, except with array of arrays,

SELECT  array_distinct( array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99] ), 
        array_distinct( array['3','3','hello','hello','bye'] ), 
        array_distinct( array[array[3,3],array[3,3],array[3,3],array[5,6]] );
 -- "{1,2,3,4,6,8,99}",  "{3,bye,hello}",  "{3,5,6}"

the "side effect" is to explode all arrays in a set of elements.

PS: with JSONB arrays works fine,

SELECT array_distinct( array['[3,3]'::JSONB, '[3,3]'::JSONB, '[5,6]'::JSONB] );
 -- "{"[3, 3]","[5, 6]"}"

Edit: more complex but useful, a "drop nulls" parameter

CREATE FUNCTION array_distinct(
      anyarray, -- input array 
      boolean DEFAULT false -- flag to ignore nulls
) RETURNS anyarray AS $f$
      SELECT array_agg(DISTINCT x) 
      FROM unnest($1) t(x) 
      WHERE CASE WHEN $2 THEN x IS NOT NULL ELSE true END;
$f$ LANGUAGE SQL IMMUTABLE;
like image 33
4 revs Avatar answered Nov 20 '22 15:11

4 revs


Using DISTINCT implicitly sorts the array. If the relative order of the array elements needs to be preserved while removing duplicates, the function can be designed like the following: (should work from 9.4 onwards)

CREATE OR REPLACE FUNCTION array_uniq_stable(anyarray) RETURNS anyarray AS
$body$
SELECT
    array_agg(distinct_value ORDER BY first_index)
FROM 
    (SELECT
        value AS distinct_value, 
        min(index) AS first_index 
    FROM 
        unnest($1) WITH ORDINALITY AS input(value, index)
    GROUP BY
        value
    ) AS unique_input
;
$body$
LANGUAGE 'sql' IMMUTABLE STRICT;
like image 17
tbussmann Avatar answered Nov 20 '22 15:11

tbussmann