Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Top N Per Group with Multiple Table Joins

Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into get top 3 per group. At first this was going well and I used several recommendations and answers from this site to achieve this (Most Viewed Products). However, I'm running into difficulty with my last one "Best Selling Products" because of multiple joins.

Basically, I need to get all products in order by # highest sales per product in which the maximum products per vendor is 3 I've got multiple tables being joined to create the original query, and each time I attempt to use the variables to generate rankings it produces invalid results. The following should help better understand the issue (I've removed unnecessary fields for brevity):

Product Table

productid | vendorid | approved | active | deleted

Vendor Table

vendorid | approved | active | deleted

Order Table

orderid | `status` | deleted

Order Items Table

orderitemid | orderid | productid | price

Now, my original query to get all results is as follows:

SELECT COUNT(oi.price) AS `NumSales`, 
       p.productid, 
       p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;

Finally, (and here's where I'm stumped), I'm trying to alter the above statement such that I received only the top 3 product (by # sold) per vendor. I'd add what I have so far, but I'm embarrassed to do so and this question is already a wall of text. I've tried variables but keep getting invalid results. Any help would be greatly appreciated.

like image 298
Jeremy Avatar asked Mar 16 '11 00:03

Jeremy


1 Answers

Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.

select
    vendorid, productid, NumSales
from
(
    select
        vendorid, productid, NumSales,
        @r := IF(@g=vendorid,@r+1,1) RowNum,
        @g := vendorid
    from (select @g:=null) initvars
    CROSS JOIN 
    (
        SELECT COUNT(oi.price) AS NumSales, 
               p.productid, 
               p.vendorid
        FROM products p
        INNER JOIN vendors v ON (p.vendorid = v.vendorid)
        INNER JOIN orders_items oi ON (p.productid = oi.productid)
        INNER JOIN orders o ON (oi.orderid = o.orderid)
        WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
        AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
        AND o.`Status` = 'SETTLED'
        AND o.Deleted = 0
        GROUP BY p.vendorid, p.productid
        ORDER BY p.vendorid, NumSales DESC
    ) T
) U
WHERE RowNum <= 3
ORDER BY NumSales DESC
LIMIT 100;

The approach here is

  1. Group by to get NumSales
  2. Use variables to row number the sales per vendor/product
  3. Filter the numbered dataset to allow for a max of 3 per vendor
  4. Order the remaining by NumSales DESC and return only 100
like image 106
RichardTheKiwi Avatar answered Sep 20 '22 20:09

RichardTheKiwi