Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to SELECT DISTINCT Info with TOP 1 Info and an Order By FROM the Top 1 Info

I have 2 tables, that look like:

CustomerInfo(CustomterID, CustomerName)
CustomerReviews(ReviewID, CustomerID, Review, Score)

I want to search reviews for a string and return CustomerInfo.CustomerID and CustomerInfo.CustomerName. However, I only want to show distinct CustomerID and CustomerName along with just one of their CustomerReviews.Reviews and CustomerReviews.Score. I also want to order by the CustomerReviews.Score.

I can't figure out how to do this, since a customer can leave multiple reviews, but I only want a list of customers with their highest scored review.

Any ideas?

like image 815
Erin Taylor Avatar asked Jun 03 '10 21:06

Erin Taylor


2 Answers

This is the greatest-n-per-group problem that has come up dozens of times on Stack Overflow.

Here's a solution that works with a window function:

WITH CustomerCTE (
  SELECT i.*, r.*, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Score DESC) AS RN
  FROM CustomerInfo i
  INNER JOIN CustomerReviews r ON i.CustomerID = r.CustomerID 
  WHERE CONTAINS(r.Review, '"search"')
)
SELECT * FROM CustomerCTE WHERE RN = 1
ORDER BY Score;

And here's a solution that works more broadly with RDBMS brands that don't support window functions:

SELECT i.*, r1.*
FROM CustomerInfo i
INNER JOIN CustomerReviews r1 ON i.CustomerID = r1.CustomerID 
  AND CONTAINS(r1.Review, '"search"')
LEFT OUTER JOIN CustomerReviews r2 ON i.CustomerID = r2.CustomerID 
  AND CONTAINS(r1.Review, '"search"')
  AND (r1.Score < r2.Score OR r1.Score = r2.Score AND r1.ReviewID < r2.ReviewID)
WHERE r2.CustomerID IS NULL
ORDER BY Score;

I'm showing the CONTAINS() function because you should be using the fulltext search facility in SQL Server, not using LIKE with wildcards.

like image 116
Bill Karwin Avatar answered Sep 22 '22 19:09

Bill Karwin


I voted for Bill Karwin's answer, but I thought I'd throw out another option.

It uses a correlated subquery, which can often incur performance problems with large data sets, so use with caution. I think the only upside is that the query is easier to immediately understand.

select *
from [CustomerReviews] r
where [ReviewID] =
(
    select top 1 [ReviewID]
    from [CustomerReviews] rInner
    where rInner.CustomerID = r.CustomerID
    order by Score desc
)
order by Score desc

I didn't add the string search filter, but that can be easily added.

like image 45
Andy White Avatar answered Sep 25 '22 19:09

Andy White