Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL join: selecting the last records in a one-to-many relationship

Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in one SELECT statement. What is the best practice? Any advice on building indexes?

Please use these table/column names in your answer:

  • customer: id, name
  • purchase: id, customer_id, item_id, date

And in more complicated situations, would it be (performance-wise) beneficial to denormalize the database by putting the last purchase into the customer table?

If the (purchase) id is guaranteed to be sorted by date, can the statements be simplified by using something like LIMIT 1?

like image 898
netvope Avatar asked Jan 21 '10 17:01

netvope


People also ask

Is Left join one to many?

SQL LEFT JOIN examples Each location belongs to one and only one country while each country can have zero or more locations. The relationship between the countries and locations tables is one-to-many.

What are the 4 types of JOINs in SQL?

There are four main types of JOINs in SQL: INNER JOIN, OUTER JOIN, CROSS JOIN, and SELF JOIN.

How do you find the latest record in a table?

To get the last record, the following is the query. mysql> select *from getLastRecord ORDER BY id DESC LIMIT 1; The following is the output. The above output shows that we have fetched the last record, with Id 4 and Name Carol.

Does the order of left JOINs matter for performance?

1 Answer. The order doesn't matter for INNER joins. As long as you change your selects from SELECT * to SELECT a.


1 Answers

This is an example of the greatest-n-per-group problem that has appeared regularly on StackOverflow.

Here's how I usually recommend solving it:

SELECT c.*, p1.* FROM customer c JOIN purchase p1 ON (c.id = p1.customer_id) LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND      (p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id))) WHERE p2.id IS NULL; 

Explanation: given a row p1, there should be no row p2 with the same customer and a later date (or in the case of ties, a later id). When we find that to be true, then p1 is the most recent purchase for that customer.

Regarding indexes, I'd create a compound index in purchase over the columns (customer_id, date, id). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN on MySQL.


Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.

like image 150
Bill Karwin Avatar answered Sep 18 '22 12:09

Bill Karwin