SQL Fiddle: http://sqlfiddle.com/#!3/9b459/6
I have a table containing answers to the question "Will you attend this event?". Each user might respond several times and all answers are stored in the table. Normally we're only interested in the latest answer, and I'm trying to construct an efficient query for that. I'm using SQL Server 2008 R2.
Table contents for one event:
Column types: int, int, datetime, bit
Primary key: (EventId, MemberId, Timestamp)
Note that Member 18 first answered No and later Yes, member 20 answered Yes at first and later No, member 11 answered No and later No again. I would like to filter out these member's first answers. Also, there might be more than one answer that should be filtered - a user might for example answer Yes, Yes, No, Yes, No, No, No.
I have tried a few different ideas, and have evaluated them in SQL Server Management Studio by entering all queries, selecting Display Estimated Execution Plan and comparing each query's total cost in percent. Is that a good method for evaluating the performance?
The different queries tested so far:
-----------------------------------------------------------------
-- Subquery to select Answer (does not include Timestamp)
-- Cost: 63 %
-----------------------------------------------------------------
select distinct a.EventId, a.MemberId,
(
select top 1 Answer
from Attendees
where EventId = a.EventId
and MemberId = a.MemberId
order by Timestamp desc
) as Answer
from Attendees a
where a.EventId = 68
-----------------------------------------------------------------
-- Where with subquery to find max(Timestamp)
-- Cost: 13 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, a.Timestamp, a.Answer
from Attendees a
where a.EventId = 68
and a.Timestamp =
(
select max(Timestamp)
from Attendees
where EventId = a.EventId
and MemberId = a.MemberId
)
order by a.TimeStamp;
-----------------------------------------------------------------
-- Group by to find max(Timestamp)
-- Subquery to select Answer matching max(Timestamp)
-- Cost: 23 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, max(a.Timestamp),
(
select top 1 Answer
from Attendees
where EventId = a.EventId
and MemberId = a.MemberId
and Timestamp = max(a.Timestamp)
) as Answer
from Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);
It would be nice to avoid using a subquery for each member. In the last query I tried using group by
but still had to use a subquery for the Answer column. I would really like something like this, but that isn't valid SQL of course:
select a.EventId, a.MemberId, max(a.Timestamp), a.Answer <-- Picked from the line selected by max(a.Timestamp)
from Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);
Any other ideas for an efficient query?
EDIT:
Very impressed by SQL Fiddle, I have entered my actual data there now: http://sqlfiddle.com/#!3/9b459/6
SQL Server 2008 supports Common Table Expression and Window Functions.
WITH recordsList
AS
(
SELECT EventID, MemberID, TimeStamp, Answer,
ROW_NUMBER() OVER (PARTITION BY EventID, MemberID
ORDER BY Timestamp DESC) rn
FROM tableName
)
SELECT EventID, MemberID, TimeStamp, Answer
FROM recordsList
WHERE rn = 1
I prefer the CTE approach as well, but here is another option using a subquery that should work:
SELECT T.EventId, T.MemberId, T.TimeStamp, T.Answer
FROM TableName T
JOIN (
SELECT EventId, MemberId, Max(Timestamp) MaxTimeStamp
FROM TableName
GROUP BY EventId, MemberId ) T2 ON T.EventId = T2.EventId
AND T.MemberId = T2.MemberId
AND T.TimeStamp = T2.MaxTimeStamp
With that said, I imagine the CTE would have a better performance.
EDIT -- Not sure about performance any longer -- here is the SQL Fiddle for both -- you can see the execution plan for each.
Good luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With