Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting most recent answers efficiently

SQL Fiddle: http://sqlfiddle.com/#!3/9b459/6

I have a table containing answers to the question "Will you attend this event?". Each user might respond several times and all answers are stored in the table. Normally we're only interested in the latest answer, and I'm trying to construct an efficient query for that. I'm using SQL Server 2008 R2.

Table contents for one event:

Table contents

Column types: int, int, datetime, bit
Primary key: (EventId, MemberId, Timestamp)

Note that Member 18 first answered No and later Yes, member 20 answered Yes at first and later No, member 11 answered No and later No again. I would like to filter out these member's first answers. Also, there might be more than one answer that should be filtered - a user might for example answer Yes, Yes, No, Yes, No, No, No.

I have tried a few different ideas, and have evaluated them in SQL Server Management Studio by entering all queries, selecting Display Estimated Execution Plan and comparing each query's total cost in percent. Is that a good method for evaluating the performance?

The different queries tested so far:

-----------------------------------------------------------------
-- Subquery to select Answer (does not include Timestamp)
-- Cost: 63 %
-----------------------------------------------------------------
select distinct a.EventId, a.MemberId,
(
  select top 1 Answer
  from    Attendees
  where EventId   = a.EventId
  and   MemberId  = a.MemberId
  order by Timestamp desc
) as Answer
from    Attendees a
where a.EventId = 68

-----------------------------------------------------------------
-- Where with subquery to find max(Timestamp)
-- Cost: 13 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, a.Timestamp, a.Answer
from     Attendees a
where  a.EventId = 68
and    a.Timestamp =
(
  select max(Timestamp)
  from     Attendees
  where  EventId  = a.EventId
  and    MemberId = a.MemberId
)
order by a.TimeStamp;

-----------------------------------------------------------------
-- Group by to find max(Timestamp)
-- Subquery to select Answer matching max(Timestamp)
-- Cost: 23 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, max(a.Timestamp),
(
  select top 1 Answer
  from    Attendees
  where EventId   = a.EventId
  and   MemberId  = a.MemberId
  and   Timestamp = max(a.Timestamp)
) as Answer
from    Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);

It would be nice to avoid using a subquery for each member. In the last query I tried using group by but still had to use a subquery for the Answer column. I would really like something like this, but that isn't valid SQL of course:

select a.EventId, a.MemberId, max(a.Timestamp), a.Answer <-- Picked from the line selected by max(a.Timestamp)
from  Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);

Any other ideas for an efficient query?


EDIT:

Very impressed by SQL Fiddle, I have entered my actual data there now: http://sqlfiddle.com/#!3/9b459/6

like image 756
Anlo Avatar asked Jan 25 '13 14:01

Anlo


2 Answers

SQL Server 2008 supports Common Table Expression and Window Functions.

WITH recordsList
AS
(
    SELECT  EventID, MemberID, TimeStamp, Answer,
            ROW_NUMBER() OVER (PARTITION BY EventID, MemberID
                                ORDER BY Timestamp DESC) rn
    FROM    tableName
)
SELECT  EventID, MemberID, TimeStamp, Answer
FROM    recordsList
WHERE   rn = 1
like image 75
John Woo Avatar answered Nov 10 '22 10:11

John Woo


I prefer the CTE approach as well, but here is another option using a subquery that should work:

SELECT T.EventId, T.MemberId, T.TimeStamp, T.Answer
FROM TableName T
 JOIN (
   SELECT EventId, MemberId, Max(Timestamp) MaxTimeStamp
   FROM TableName
   GROUP BY EventId, MemberId ) T2 ON T.EventId = T2.EventId 
    AND T.MemberId = T2.MemberId 
    AND T.TimeStamp = T2.MaxTimeStamp

With that said, I imagine the CTE would have a better performance.

EDIT -- Not sure about performance any longer -- here is the SQL Fiddle for both -- you can see the execution plan for each.

Good luck.

like image 3
sgeddes Avatar answered Nov 10 '22 09:11

sgeddes