Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

improving performance in sql with multiple tables

I've got two tables: Log(id,user,action,date) and ActionTypes(action,type). Given an action A0, and a type T0, I would like to count for each user, how many times she has used each other action Ai right after A0, but skipping the actions of Log which are not of type T0. So for example:

Log:

id   user   action        date
----------------------------------------
1    mary   start   2012-07-16 08:00:00
2    mary   open    2012-07-16 09:00:00
3    john   start   2012-07-16 09:00:00
4    mary   play    2012-07-16 10:00:00
5    john   open    2012-07-16 10:30:00
6    mary   start   2012-07-16 11:00:00
7    mary   jump    2012-07-16 12:00:00
8    mary   close   2012-07-16 13:00:00
9    mary   delete  2012-07-16 14:00:00
10   mary   start   2012-07-16 15:00:00
11   mary   open    2012-07-16 16:00:00

ActionTypes:

action  type
--------------
start   0
open    1
play    1
jump    2
close   1
delete  1

So, given the action 'start' and the type '1' the answer will be:

user   action    ntimes
------------------------
mary   open      2
mary   close     1
john   open      1

My attempt is

SELECT b.user,b.action, count(*)
FROM log a, log b
WHERE a.action='start' AND b.date>a.date AND a.user=b.user AND
      1=(select type from ActionTypes where action=b.action) AND
      not exists (SELECT c.action FROM log c where c.user=a.user AND                  
                  c.date>a.date and c.date<b.date and                            
                  1=(select type from ActionTypes where action=c.action))
GROUP BY b.user,b.action

Our Log table has about 1 million tuples and the query works, but it is too slow. We're using SQLServer. Any hints on how to make it faster? Thanks

like image 634
SAL PIMIENTA Avatar asked Feb 20 '23 20:02

SAL PIMIENTA


2 Answers

Could you please give this query a try? It uses exists to test if previous chronological record is of requested type. I believe that it would be faster than self-join. I have put a demo @ Sql Fiddle.

select log.[user], log.action, count(*) ntimes
  from log
 inner join actiontype t
    on log.action = t.action
 where t.type = 1
   and exists (select *
                 from 
                   (select top 1 t1.type
                      from log l1
                     inner join actiontype t1
                        on l1.action = t1.action
                     where l1.[user] = log.[user]
                       and l1.date < log.date
                       and t1.type in (0, 1)
                     order by l1.date desc
                   ) prevEntry
                where prevEntry.type = 0
               )
 group by log.[user], log.action

I don't understand why is mary \ close in result list. Previous record is jump which is of type 2, and should not be skipped over to get to start.

like image 191
Nikola Markovinović Avatar answered Mar 04 '23 09:03

Nikola Markovinović


After borrowing @Nikola Markovinović's setup, I came up with the following solution:

WITH ranked AS (
  SELECT
    L1.[user],
    L2.action,
    rnk = ROW_NUMBER() OVER (PARTITION BY L1.id ORDER BY L2.date)
  FROM Log L1
    INNER JOIN Log L2 ON L2.[user] = L1.[user] AND L2.date > L1.date
    INNER JOIN ActionType at ON L2.action = at.action
  WHERE L1.action = @Action
    AND at.type   = @Type
)
SELECT
  [user],
  action,
  ntimes = COUNT(*)
FROM ranked
WHERE rnk = 1
GROUP BY
  [user],
  action
;

Basically, this query selects from the Log table all users' records that have the specified action, then joins that subset back to Log to retrieve all actions of the specified type that follow those in the first subset, ranking them in the ascending order of date along the way (using the ROW_NUMBER() function). Then the query retrieves only rows with the rankings of 1, groups them by user and action and counts rows in the groups.

You can see (and play with) a working example at SQL Fiddle.

like image 33
Andriy M Avatar answered Mar 04 '23 10:03

Andriy M