array_agg contains another array_agg

Question

t1
id|entity_type
9|3
9|4
9|5
2|3
2|5
           
t2  
id|entity_type
1|3
1|4
1|5

SELECT t1.id, array_agg(t1.entity_type)
    FROM t1
GROUP BY
    t1.id
HAVING ARRAY_AGG(t1.entity_type ORDER BY t1.entity_type) = 
    (SELECT ARRAY_AGG(t2.entity_type ORDER BY t2.entity_type) 
        FROM t2
    WHERE t2.id = 1
    GROUP BY t2.id);

Result:

t1.id = 9|array_agg{3,4,5}

I have two tables t1 and t2. I want to get value of t1.id where t1.entity_type array equals t2.entity_type array.

In this scenario everything works fine. For t2.id = 1 I receive t1.id = 9. Both have the same array of entity_type: {3,4,5}

Now I'd like to get t1.id not only for equal sets, but also for smaller sets. If I modify t2 this way:

t2  
id|entity_type
1|3
1|4

and modify query this way:

SELECT t1.id, array_agg(t1.entity_type)
    FROM t1
GROUP BY
    t1.id
HAVING ARRAY_AGG(t1.entity_type ORDER BY t1.entity_type) >= /*MODIFICATION*/
    (SELECT ARRAY_AGG(t2.entity_type ORDER BY t2.entity_type) 
        FROM t2
    WHERE t2.id = 1
    GROUP BY t2.id);

I don't receive the expected result:

t1.id = 1 has {3, 4, 5}     
t2.id = 1 has {3, 4}

Arrays in t1 that contain the array in t2 should qualify. I expect to receive results as in first case but I get no rows.
Is there any method like: ARRAY_AGG contains another ARRAY_AGG?

Erwin Brandstetter · Accepted Answer

Clean up

It would be inefficient to use two differing invocations of array_agg(). Use the same (ORDER BY in SELECT list and HAVING clause):

SELECT id, array_agg(entity_type ORDER BY entity_type) AS arr
FROM   t1
GROUP  BY 1
HAVING array_agg(entity_type ORDER BY entity_type) = (
   SELECT array_agg(entity_type ORDER BY entity_type)
   FROM   t2
   WHERE  id = 1
   -- GROUP  BY id   -- not needed
   );

Syntax basics in the manual.

"contains" operator `@>`

Like Nick commented, your 2nd query would work with the "array contains" operator @>

SELECT id, array_agg(entity_type ORDER BY entity_type) AS arr
FROM   t1
GROUP  BY 1
HAVING array_agg(entity_type ORDER BY entity_type) @> (
   SELECT array_agg(entity_type ORDER BY entity_type)
   FROM   t2
   WHERE  id = 1
   );

But this is very inefficient for big tables.

Faster query

The problem can be cast as a case of relational division. Depending on your table definition, there are more efficient techniques. We have gathered a whole arsenal under this related question:

How to filter SQL results in a has-many-through relation

Assuming (id, entity_type) is unique in both tables, this should be substantially faster for big tables, especially because it can use an index on t1 (as opposed to your original query):

SELECT t1.id
FROM   t2
JOIN   t1 USING (entity_type)
WHERE  t2.id = 1
GROUP  BY 1
HAVING count(*) = (SELECT count(*) FROM t2 WHERE id = 1);

You need two indexes:

First on t2(id), typically covered by the primary key.
Second:

CREATE INDEX t1_foo_idx ON t1 (entity_type, id);

The added id column is optional to allow index-only scans. Sequence of columns is essential:

Is a composite index also good for queries on the first field?

fiddle
_{Old sqlfiddle}

array_agg contains another array_agg

Tags:

arrays

sql

postgresql

relational-division

Marcin

1 Answers

Clean up

"contains" operator `@>`

Faster query

Erwin Brandstetter

Recent Activity

Donate For Us

array_agg contains another array_agg

Tags:

arrays

sql

postgresql

relational-division

Marcin

1 Answers

Clean up

"contains" operator @>

Faster query

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us

"contains" operator `@>`