Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL where all in array

Tags:

What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN? After all it should behave like mongodb's $all.

Thinking about group conversations where conversation_users is a join table between conversation_id and user_id I have something like this in mind:

WHERE (conversations_users.user_id ALL IN (1,2)) 

UPDATE 16.07.12

Adding more info about schema and case:

  1. The join-table is rather simple:

                  Table "public.conversations_users"      Column      |  Type   | Modifiers | Storage | Description  -----------------+---------+-----------+---------+-------------  conversation_id | integer |           | plain   |   user_id         | integer |           | plain   |  
  2. A conversation has many users and a user belongs to many conversations. In order to find all users in a conversation I am using this join table.

  3. In the end I am trying to figure out a ruby on rails scope that find's me a conversation depending on it's participants - e.g.:

    scope :between, ->(*users) {   joins(:users).where('conversations_users.user_id all in (?)', users.map(&:id)) } 

UPDATE 23.07.12

My question is about finding an exact match of people. Therefore:

Conversation between (1,2,3) won't match if querying for (1,2)

like image 795
pex Avatar asked Jul 13 '12 10:07

pex


People also ask

Can we use array in where clause?

Array can be passed in WHERE clause of sql query and in PLSQL as well.

How do you query an array?

To query if the array field contains at least one element with the specified value, use the filter { <field>: <value> } where <value> is the element value. To specify conditions on the elements in the array field, use query operators in the query filter document: { <array field>: { <operator1>: <value1>, ... } }


2 Answers

Assuming the join table follows good practice and has a unique compound key defined, i.e. a constraint to prevent duplicate rows, then something like the following simple query should do.

select conversation_id from conversations_users where user_id in (1, 2) group by conversation_id having count(*) = 2 

It's important to note that the number 2 at the end is the length of the list of user_ids. That obviously needs to change if the user_id list changes length. If you can't assume your join table doesn't contain duplicates, change "count(*)" to "count(distinct user_id)" at some possible cost in performance.

This query finds all conversations that include all the specified users even if the conversation also includes additional users.

If you want only conversations with exactly the specified set of users, one approach is to use a nested subquery in the where clause as below. Note, first and last lines are the same as the original query, only the middle two lines are new.

select conversation_id from conversations_users where user_id in (1, 2)    and conversation_id not in    (select conversation_id from conversations_users where user_id not in (1,2)) group by conversation_id having count(*) = 2 

Equivalently, you can use a set difference operator if your database supports it. Here is an example in Oracle syntax. (For Postgres or DB2, change the keyword "minus" to "except.)

select conversation_id from conversations_users where user_id in (1, 2)   group by conversation_id having count(*) = 2 minus   select conversation_id from conversations_users where user_id not in (1,2) 

A good query optimizer should treat the last two variations identically, but check with your particular database to be sure. For example, the Oracle 11GR2 query plan sorts the two sets of conversation ids before applying the minus operator, but skips the sort step for the last query. So either query plan could be faster depending on multiple factors such as the number of rows, cores, cache, indices etc.

like image 178
Alex Blakemore Avatar answered Oct 11 '22 08:10

Alex Blakemore


I'm collapsing those users into an array. I'm also using a CTE (the thing in the WITH clause) to make this more readable.

=> select * from conversations_users ;  conversation_id | user_id -----------------+---------                1 |       1                1 |       2                2 |       1                2 |       3                3 |       1                3 |       2 (6 rows)         => WITH users_on_conversation AS (   SELECT conversation_id, array_agg(user_id) as users   FROM conversations_users   WHERE user_id in (1, 2) --filter here for performance                                                                                         GROUP BY conversation_id ) SELECT * FROM users_on_conversation WHERE users @> array[1, 2];  conversation_id | users -----------------+-------                1 | {1,2}                3 | {1,2} (2 rows)  

EDIT (Some resources)

  • array functions: http://www.postgresql.org/docs/9.1/static/functions-array.html
  • CTEs: http://www.postgresql.org/docs/9.1/static/queries-with.html
like image 32
hgmnz Avatar answered Oct 11 '22 06:10

hgmnz