Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Within-group aggregates with no sub-queries - suggested test data updated

I have two tables in MySQL sales database:

Orders table:

CREATE TABLE salestest.`orders` (  
`ID` int(11) unsigned NOT NULL auto_increment,  
`OrderDate` datetime NOT NULL,  
`CustomerID` int(11) unsigned NOT NULL,  
PRIMARY KEY (`ID`),  
UNIQUE KEY `ID` (`ID`),  
KEY `OrderDate` (`OrderDate`),  
KEY `CustomerID` (`CustomerID`)  
) ENGINE=InnoDB;  

INSERT INTO salestest.orders VALUES  
( 1, '2012-04-15', 1 ),  
( 2, '2012-05-20', 1 ),  
( 3, '2012-06-30', 1 );  

OrderDetails table:

CREATE TABLE salestest.`OrderDetails` (  
`ID` int(11) unsigned NOT NULL auto_increment,  
`OrderID` int(11) unsigned NOT NULL,  
`ProductID` int(11) unsigned NOT NULL,  
`Price` double NOT NULL default '0',  
PRIMARY KEY  (`ID`),  
UNIQUE KEY `ID` (`ID`),  
KEY `OrderID` (`OrderID`),  
KEY `ProductID` (`ProductID`),  
CONSTRAINT `OrderID_fk` FOREIGN KEY (`OrderID`) REFERENCES `orders` (`ID`)  
) ENGINE=InnoDB;  

INSERT INTO salestest.OrderDetails VALUES  
( 1, 1, 1, 2 ),  
( 2, 1, 2, 15 ),  
( 3, 1, 3, 22 ),  
( 4, 2, 1, 3 ),  
( 5, 2, 2, 17 ),  
( 6, 2, 3, 23 ),  
( 7, 2, 4, 40 ),  
( 8, 3, 1, 4 ),  
( 9, 3, 2, 20 );  

Now I need to select for each customer, the last price they purchase each product.

The easy way to do it is by using a subquery:

SELECT od2.CustomerID,od2.ProductID, od2.Price AS LastPrice, od2.OrderDate AS LastDate  
FROM (SELECT o1.ID, o1.CustomerID, o1.OrderDate, od1.ProductID, od1.Price  
  FROM orders AS o1  
  LEFT JOIN OrderDetails as od1 ON o1.ID=od1.OrderID  
  ORDER BY OrderDate DESC) AS od2  
GROUP BY CustomerID, ProductID  
ORDER BY CustomerID, ProductID;  

Result:

CustomerID ProductID LastPrice LastDate
1 1 4 2012-06-30 00:00:00
1 2 20 2012-06-30 00:00:00
1 3 23 2012-05-20 00:00:00
1 4 40 2012-05-20 00:00:00

Now the question; how is it possible to get the same result if I want to avoid sub-query, temp tables or a view, I only want to use joins; this query is a small part of a much larger query, and having sub-query is highly inefficient.

I tried this query:

SELECT o1.CustomerID,od1.ProductID, od1.Price AS LastPrice, o1.OrderDate AS LastDate
FROM Orders AS o1 LEFT JOIN OrderDetails as od1 ON o1.ID=od1.OrderID
GROUP BY CustomerID, ProductID
ORDER BY CustomerID, ProductID;

but it gives a different result:

CustomerID ProductID LastPrice LastDate
1 1 2 2012-04-15 00:00:00
1 2 15 2012-04-15 00:00:00
1 3 22 2012-04-15 00:00:00
1 4 40 2012-05-20 00:00:00

As you see, LastPrice & LastDate are not correct. I also tried Allen's suggestion, but the result is:

CustomerID ProductID LastPrice LastDate
1 1 4 2012-06-30 00:00:00
1 2 20 2012-06-30 00:00:00

first query from spencer's answer results duplicated products:

CustomerID ProductID LastPrice LastDate
1 3 22 2012-04-15 00:00:00
1 3 23 2012-05-20 00:00:00
1 4 40 2012-05-20 00:00:00
1 1 4 2012-06-30 00:00:00
1 2 20 2012-06-30 00:00:00

other answers all use sub-query, which I am trying to avoid.
any suggestions?

like image 283
user1499268 Avatar asked Nov 14 '22 02:11

user1499268


1 Answers

Look for "greatest-n-per-group"

It's the greatest thing I've ever learned in SQL, I hope you love it too.

OK, here's my stab at it:

SELECT o.CustomerID, od.ProductID, od.Price AS LastPrice, o.OrderDate AS LastDate  
FROM OrderDetails od
LEFT JOIN orders as o ON od.OrderID = o.ID
LEFT JOIN orders as o2 ON o.CustomerID = o2.CustomerID AND o.id < o2.id
WHERE o2.id IS NULL
ORDER BY o.CustomerID, od.ProductID;

You want to know, by customer + product, what the last time that customer bought each product and what they paid for it.

So I started with the product, joined the orders (first join), then joined orders again so that I can limit the query to a single order per customer + product (o2 matches all orders up to, but not including the most recent order). We then use the fact that o2 doesn't match the most recent order to only select that one row.

This assumes that you won't have the same item twice in one order with different pricing and that newer orders will always have a higher ID.

Hopefully this gets you close enough that your real data/query can modify as needed - Good luck!

like image 134
Allen Holman Avatar answered Dec 04 '22 14:12

Allen Holman