I have a one-to-many relationship of rooms and their occupants:
Room | User
1 | 1
1 | 2
1 | 4
2 | 1
2 | 2
2 | 3
2 | 5
3 | 1
3 | 3
Given a list of users, e.g. 1, 3, what is the most efficient way to determining which room is completely/perfectly filled by them? So in this case, it should return room 3 because, although they are both in room 2, room 2 has other occupants as well, which is not a "perfect" fit.
I can think of several solutions to this, but am not sure about the efficiency. For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".
Or I can do a count of the total number of users in a room AND containing both users 1 and 3. I will then select the room which has the count of users equal to two.
Note I want to most efficient way, or at least a way that scales up to millions of users and rooms. Each room will have around 25 users. Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5
and so on? Or is there a way to pass the values as an array into a stored procedure?
Any help would be appreciated.
For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".
First, a word of advice, to improve your level of function as a developer. Stop thinking of the data, and of the solution, in terms of CSVs. It limits you to thinking in spreadsheet terms, and prevents you from thinking in Relational Data terms. You do not need to construct strings, and then match strings, when the data is in the database, you can match it there.
Now then, in Relational data terms, what exactly do you want ? You want the rooms where the count of users that match your argument user list is highest. Is that correct ? If so, the code is simple.
You haven't given the tables. I will assume room, user, room_user,
with deadly ids
on the first two, and a composite key on the third. I can give you the SQL solution, you will have to work out how to do it in the non-SQL.
Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?
To pass the list to the stored proc, because it needs a single calling parm, the length of which is variable, you have to create a CSV list of users. Let's call that parm @user_list.
(Note, that is not contemplating the data, that is passing a list to a proc in a single parm, because you can't pass an unknown number of identified users to a proc otherwise.)
Since you constructed the @user_list
on the client, you may as well compute @user_count
(the number of members in the list) while you are at it, on the client, and pass that to the proc.
Something like:
CREATE PROC room_user_match_sp (
@user_list CHAR(255),
@user_count INT
...
)
AS
-- validate parms, etc
...
SELECT room_id,
match_count,
match_count / @user_count * 100 AS match_pct
FROM (
SELECT room_id,
COUNT(user_id) AS match_count -- no of users matched
FROM room_user
WHERE user_id IN ( @user_list )
GROUP BY room_id -- get one row per room
) AS match_room -- has any matched users
WHERE match_count = MAX( match_count ) -- remove this while testing
It is not clear, if you want full matches only. In that case, use:
WHERE match_count = @user_count
You have asked for a proc-based solution, so I have given that. Yes, it is the fastest. But keep in mind that for this kind of requirement and solution, you could construct the SQL string on the client, and execute it on the "server" in the usual manner, without using a proc. The proc is faster here only because the code is compiled and that step is removed, as opposed to that step being performed every time the client calls the "server" with the SQL string.
The point I am making here is, with the data in a reasonably Relational form, you can obtain the result you are seeking using a single SELECT
statement, you don't have to mess around with work tables or temp tables or intermediate steps, which requires a proc. Here, the proc is not required, you are implementing a proc for performance reasons.
I make this point because it is clear from your question that your expectation of the solution is "gee, I can't get the result directly, I have work with the data first, I am ready and willing to do that". Such intermediate work steps are required only when the data is not Relational.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With