Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inner Join a Table to Itself

I have a table that uses two identifying columns, let's call them id and userid. ID is unique in every record, and userid is unique to the user but is in many records.

What I need to do is get a record for the User by userid and then join that record to the first record we have for the user. The logic of the query is as follows:

SELECT v1.id, MIN(v2.id) AS entryid, v1.userid
FROM views v1
INNER JOIN views v2
  ON v1.userid = v2.userid

I'm hoping that I don't have to join the table to a subquery that handles the min() piece of the code as that seems to be quite slow.

like image 708
Dave Long Avatar asked Dec 03 '12 17:12

Dave Long


2 Answers

I guess (it's not entirely clear) you want to find for every user, the rows of the table that have minimum id, so one row per user.

In that case, you an use a subquery (a derived table) and join it to the table:

SELECT v.*
FROM views AS v
  JOIN
    ( SELECT userid, MIN(id) AS entryid
      FROM views
      GROUP BY userid
    ) AS vm
    ON  vm.userid = v.userid 
    AND vm.entryid = v.id ;

The above can also be written using a Common Table Expression (CTE), if you like them:

; WITH vm AS
    ( SELECT userid, MIN(id) AS entryid
      FROM views
      GROUP BY userid
    )
  SELECT v.*
  FROM views AS v
    JOIN vm
      ON  vm.userid = v.userid 
      AND vm.entryid = v.id ;

Both would be quite efficient with an index on (userid, id).

With SQL-Server, you could write this using the ROW_NUMBER() window function:

; WITH viewsRN AS
    ( SELECT *
           , ROW_NUMBER() OVER (PARTITION BY userid ORDER BY id) AS rn
      FROM views
    ) 
  SELECT *                      --- skipping the "rn" column
  FROM viewsRN
  WHERE rn = 1 ;
like image 97
ypercubeᵀᴹ Avatar answered Oct 21 '22 09:10

ypercubeᵀᴹ


Well, to use the MIN function along with non-aggregate columns, you'd have to group the statement. That's possible with the query you have... (EDIT based on additional info)

SELECT MIN(v2.id) AS entryid, v1.id, v1.userid
FROM views v1
INNER JOIN views v2
  ON v1.userid = v2.userid      
GROUP BY v1.id, v1.userid

... however if this is just a simple example and you're looking to pull more data with this query, it quickly becomes an unfeasible solution.

What you seem to want is a list of all the user data in this view, with a link on each row leading back to the "first" record that exists for the same user. The above query will get you what you want, but there are much easier ways to determine the first record for each user:

SELECT v1.id, v1.userid
FROM views v1
ORDER BY v1.userid, v1.id

The first record for each unique user is your "entry point". I think I understand why you want to do it the way you specified, and the first query I gave will be reasonably performant, but you'll have to consider whether not having to use the order by clause to get the correct answer is worth it.

like image 40
KeithS Avatar answered Oct 21 '22 10:10

KeithS