This is a very basic query I can't figure out.... Let's say I have a two column table like this: <pre class="prettyprint"><code>userid | roleid --------|-------- 1 | 1 1 | 2 1 | 3 2 | 1 </code></pre> I want to get all distinct userids that have <code>roleids</code> 1, 2 AND 3. Using the above example, the only result I want returned is <code>userid</code> 1. How do I do this?

Ok, I got downvoted on this so I decided to test it: <pre class="prettyprint"><code>CREATE TABLE userrole ( userid INT, roleid INT, PRIMARY KEY (userid, roleid) ); CREATE INDEX ON userrole (roleid); </code></pre> Run this: <pre class="prettyprint"><code><?php ini_set('max_execution_time', 120); // takes over a minute to insert 500k+ records $start = microtime(true); echo "<pre>\n"; mysql_connect('localhost', 'scratch', 'scratch'); if (mysql_error()) { echo "Connect error: " . mysql_error() . "\n"; } mysql_select_db('scratch'); if (mysql_error()) { echo "Selct DB error: " . mysql_error() . "\n"; } $users = 200000; $count = 0; for ($i=1; $i<=$users; $i++) { $roles = rand(1, 4); $available = range(1, 5); for ($j=0; $j<$roles; $j++) { $extract = array_splice($available, rand(0, sizeof($available)-1), 1); $id = $extract[0]; query("INSERT INTO userrole (userid, roleid) VALUES ($i, $id)"); $count++; } } $stop = microtime(true); $duration = $stop - $start; $insert = $duration / $count; echo "$count users added.\n"; echo "Program ran for $duration seconds.\n"; echo "Insert time $insert seconds.\n"; echo "</pre>\n"; function query($str) { mysql_query($str); if (mysql_error()) { echo "$str: " . mysql_error() . "\n"; } } ?> </code></pre> Output: <pre class="prettyprint"><code>499872 users added. Program ran for 56.5513510704 seconds. Insert time 0.000113131663847 seconds. </code></pre> That adds 500,000 random user-role combinations and there are approximately 25,000 that match the chosen criteria. First query: <pre class="prettyprint"><code>SELECT userid FROM userrole WHERE roleid IN (1, 2, 3) GROUP by userid HAVING COUNT(1) = 3 </code></pre> Query time: 0.312s <pre class="prettyprint"><code>SELECT t1.userid FROM userrole t1 JOIN userrole t2 ON t1.userid = t2.userid AND t2.roleid = 2 JOIN userrole t3 ON t2.userid = t3.userid AND t3.roleid = 3 AND t1.roleid = 1 </code></pre> Query time: 0.016s That's right. The join version I proposed is twenty times faster than the aggregate version. Sorry but I do this for a living and work in the real world and in the real world we test SQL and the results speak for themselves. The reason for this should be pretty clear. The aggregate query will scale in cost with the size of the table. Every row is processed, aggregated and filtered (or not) through the <code>HAVING</code> clause. The join version will (using an index) select a subset of the users based on a given role, then check that subset against the second role and finally that subset against the third role. Each selection (in relational algebra terms) works on an increasingly small subset. From this you can conclude: The performance of the join version gets even better with a lower incidence of matches. If there were only 500 users (out of the 500k sample above) that had the three stated roles, the join version will get significantly faster. The aggregate version will not (and any performance improvement is a result of transporting 500 users instead of 25k, which the join version obviously gets too). I was also curious to see how a real database (ie Oracle) would deal with this. So I basically repeated the same exercise on Oracle XE (running on the same Windows XP desktop machine as the MySQL from the previous example) and the results are almost identical. Joins seem to be frowned upon but as I've demonstrated, aggregate queries can be an order of magnitude slower. Update: After some extensive testing, the picture is more complicated and the answer will depend on your data, your database and other factors. The moral of the story is test, test, test.

Select values that meet different conditions on different rows?

Tags:

sql

mysql

sql-match-all

relational-division

This is a very basic query I can't figure out....

Let's say I have a two column table like this:

userid  |  roleid --------|--------    1    |    1    1    |    2    1    |    3    2    |    1

I want to get all distinct userids that have roleids 1, 2 AND 3. Using the above example, the only result I want returned is userid 1. How do I do this?

739

asked Jan 25 '09 01:01

John

1 Answers

Ok, I got downvoted on this so I decided to test it:

CREATE TABLE userrole (   userid INT,   roleid INT,   PRIMARY KEY (userid, roleid) );  CREATE INDEX ON userrole (roleid);

Run this:

<?php ini_set('max_execution_time', 120); // takes over a minute to insert 500k+ records   $start = microtime(true);  echo "<pre>\n"; mysql_connect('localhost', 'scratch', 'scratch'); if (mysql_error()) {     echo "Connect error: " . mysql_error() . "\n"; } mysql_select_db('scratch'); if (mysql_error()) {     echo "Selct DB error: " . mysql_error() . "\n"; }  $users = 200000; $count = 0; for ($i=1; $i<=$users; $i++) {     $roles = rand(1, 4);     $available = range(1, 5);     for ($j=0; $j<$roles; $j++) {         $extract = array_splice($available, rand(0, sizeof($available)-1), 1);         $id = $extract[0];         query("INSERT INTO userrole (userid, roleid) VALUES ($i, $id)");         $count++;     } }  $stop = microtime(true); $duration = $stop - $start; $insert = $duration / $count;  echo "$count users added.\n"; echo "Program ran for $duration seconds.\n"; echo "Insert time $insert seconds.\n"; echo "</pre>\n";  function query($str) {     mysql_query($str);     if (mysql_error()) {         echo "$str: " . mysql_error() . "\n";     } } ?>

Output:

499872 users added. Program ran for 56.5513510704 seconds. Insert time 0.000113131663847 seconds.

That adds 500,000 random user-role combinations and there are approximately 25,000 that match the chosen criteria.

First query:

SELECT userid FROM userrole WHERE roleid IN (1, 2, 3) GROUP by userid HAVING COUNT(1) = 3

Query time: 0.312s

SELECT t1.userid FROM userrole t1 JOIN userrole t2 ON t1.userid = t2.userid AND t2.roleid = 2 JOIN userrole t3 ON t2.userid = t3.userid AND t3.roleid = 3 AND t1.roleid = 1

Query time: 0.016s

That's right. The join version I proposed is twenty times faster than the aggregate version.

Sorry but I do this for a living and work in the real world and in the real world we test SQL and the results speak for themselves.

The reason for this should be pretty clear. The aggregate query will scale in cost with the size of the table. Every row is processed, aggregated and filtered (or not) through the HAVING clause. The join version will (using an index) select a subset of the users based on a given role, then check that subset against the second role and finally that subset against the third role. Each selection (in relational algebra terms) works on an increasingly small subset. From this you can conclude:

The performance of the join version gets even better with a lower incidence of matches.

If there were only 500 users (out of the 500k sample above) that had the three stated roles, the join version will get significantly faster. The aggregate version will not (and any performance improvement is a result of transporting 500 users instead of 25k, which the join version obviously gets too).

I was also curious to see how a real database (ie Oracle) would deal with this. So I basically repeated the same exercise on Oracle XE (running on the same Windows XP desktop machine as the MySQL from the previous example) and the results are almost identical.

Joins seem to be frowned upon but as I've demonstrated, aggregate queries can be an order of magnitude slower.

Update: After some extensive testing, the picture is more complicated and the answer will depend on your data, your database and other factors. The moral of the story is test, test, test.

134

answered Sep 23 '22 10:09

cletus

Related questions
                            
                                Wordpress $wpdb. Insert Multiple Records
                            
                                How do I get the MySQL table structure in PHP? Plus a list of all tables?
                            
                                Validation failed for query for method JPQL
                            
                                Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
                            
                                Could not open mysql.plugin table. Some plugins may be not loaded
                            
                                version of mysql2 (0.3.2) doesn't ship with the ActiveRecord adapter bundled anymore as it's now part of Rails 3.1
                            
                                Convert DateTime for MySQL using C#
                            
                                Copy 1 column to another by mysql query
                            
                                mysql concat function
                            
                                Limit length of longtext field in SELECT results
                            
                                How to find duplicate rows based on multiple fields in MySQL? [duplicate]
                            
                                Getting first day of the week in MySql using Week No
                            
                                UNION query with codeigniter's active record pattern
                            
                                How to quote values using group_concat
                            
                                Error "1038 Out of sort memory, consider increasing sort buffer size
                            
                                whats the difference between '!=' and '<>' in mysql
                            
                                How to check if a value already exists to avoid duplicates?
                            
                                Import SQL file by command line in Windows 7
                            
                                Python & MySql: Unicode and Encoding
                            
                                I keep getting this mysql error code #1089

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With