I've got two tables: Log(id,user,action,date) and ActionTypes(action,type). Given an action A0, and a type T0, I would like to count for each user, how many times she has used each other action Ai right after A0, but skipping the actions of Log which are not of type T0. So for example:
Log:
id user action date
----------------------------------------
1 mary start 2012-07-16 08:00:00
2 mary open 2012-07-16 09:00:00
3 john start 2012-07-16 09:00:00
4 mary play 2012-07-16 10:00:00
5 john open 2012-07-16 10:30:00
6 mary start 2012-07-16 11:00:00
7 mary jump 2012-07-16 12:00:00
8 mary close 2012-07-16 13:00:00
9 mary delete 2012-07-16 14:00:00
10 mary start 2012-07-16 15:00:00
11 mary open 2012-07-16 16:00:00
ActionTypes:
action type
--------------
start 0
open 1
play 1
jump 2
close 1
delete 1
So, given the action 'start' and the type '1' the answer will be:
user action ntimes
------------------------
mary open 2
mary close 1
john open 1
My attempt is
SELECT b.user,b.action, count(*)
FROM log a, log b
WHERE a.action='start' AND b.date>a.date AND a.user=b.user AND
1=(select type from ActionTypes where action=b.action) AND
not exists (SELECT c.action FROM log c where c.user=a.user AND
c.date>a.date and c.date<b.date and
1=(select type from ActionTypes where action=c.action))
GROUP BY b.user,b.action
Our Log table has about 1 million tuples and the query works, but it is too slow. We're using SQLServer. Any hints on how to make it faster? Thanks
Could you please give this query a try? It uses exists to test if previous chronological record is of requested type. I believe that it would be faster than self-join. I have put a demo @ Sql Fiddle.
select log.[user], log.action, count(*) ntimes
from log
inner join actiontype t
on log.action = t.action
where t.type = 1
and exists (select *
from
(select top 1 t1.type
from log l1
inner join actiontype t1
on l1.action = t1.action
where l1.[user] = log.[user]
and l1.date < log.date
and t1.type in (0, 1)
order by l1.date desc
) prevEntry
where prevEntry.type = 0
)
group by log.[user], log.action
I don't understand why is mary
\ close
in result list. Previous record is jump which is of type 2
, and should not be skipped over to get to start.
After borrowing @Nikola Markovinović's setup, I came up with the following solution:
WITH ranked AS (
SELECT
L1.[user],
L2.action,
rnk = ROW_NUMBER() OVER (PARTITION BY L1.id ORDER BY L2.date)
FROM Log L1
INNER JOIN Log L2 ON L2.[user] = L1.[user] AND L2.date > L1.date
INNER JOIN ActionType at ON L2.action = at.action
WHERE L1.action = @Action
AND at.type = @Type
)
SELECT
[user],
action,
ntimes = COUNT(*)
FROM ranked
WHERE rnk = 1
GROUP BY
[user],
action
;
Basically, this query selects from the Log
table all users' records that have the specified action, then joins that subset back to Log
to retrieve all actions of the specified type that follow those in the first subset, ranking them in the ascending order of date
along the way (using the ROW_NUMBER()
function). Then the query retrieves only rows with the rankings of 1
, groups them by user
and action
and counts rows in the groups.
You can see (and play with) a working example at SQL Fiddle.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With