Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a summary row for data across multiple tables

Tags:

sql

mysql

I'm trying to write a SQL query to generate a summary row for the actions performed by a given user in a given period. I have the following relevant table structure:

users

  • id
  • team

audit_periods (can be processing, shipping, break, etc)

  • user_id
  • period_type (can be "processing", "shipping", etc -- not currently normalized)
  • started_at
  • finished_at (can be null for the current period, hence the logic around times below)

audit_tasks

  • audit_period_id
  • audit_task_type_id
  • created_at
  • score

audit_task_types

  • name ("scan", "place_in_pallet", etc)
  • score (seems redundant, but we need to maintain the score that the audit_task received at the time it was performed, as the audit_task_type score can change later)

ER Diagram

For each user for a given period, I'd like to create something like the following row of data:

users.id users.email time_spent_processing time_spent_shipping ... number_of_scans number_of_pallets

which would be calculated by figuring out for each user:

  • What audit_periods fall at least partially in the desired window? (Uses started_at and finished_at.)
  • How long did a user spend in each type of audit_period? (Should involve group by audit_periods.period_type, I'd imagine.)
  • What audit_tasks fall within the desired window? (Uses created_at -- not in the code below yet.)
  • How many of each type of audit_task did a user accomplish during the window? (Joins out to audit_task_type, and likely involves a group by on audit_task_types.name.)
  • How many points were earned during the time period? (Sums the scores of all the audit_tasks in the window.)

I've exhausted all of the SQL tricks I know (not many) and came up with something like the following:

select 
    u.id as user_id,
    u.email as email,
    u.team as team,
    ap.period_type as period_type,
    att.name,
    time_to_sec(
      timediff(least("2011-03-17 00:00:00", ifnull(ap.finished_at, utc_timestamp())), greatest("2011-03-16 00:00:00", ap.started_at))
    ) as period_duration,
    sum(at.score) as period_score
  from audit_periods as ap
  inner join users as u on ap.user_id = u.id
  left join audit_tasks as at on at.audit_period_id = ap.id
  left join audit_task_types as att on at.audit_task_type_id = att.id
  where (ap.started_at >= "2011-03-16 00:00:00" or (ap.finished_at >= "2011-03-17 00:00:00" and ap.finished_at <= "2011-03-17 00:00:00"))
    and (ap.finished_at <= "2011-03-17 00:00:00" or (ap.started_at >= "2011-03-16 00:00:00" and ap.started_at <= "2011-03-16 00:00:00"))
    and u.team in ("Foo", "Bar")
  group by u.id, ap.id, at.id

but this seems to be functionally equivalent to just selecting all of the audit tasks in the end. I've tried some subqueries as well, but to little avail. More directly, this generates something like (skipping less important columns):

user_id   |   period_type   |   period_duration  |  name            |   score
1             processing        1800s               scan                200
1             shipping          1000s               place_in_pallet     100
1             shipping          1000s               place_in_pallet     100
1             break             500s                null                null

when I want:

user_id   |   processing    |   shipping  |  break  |  scan  |  place_in_pallet  |  score
1             1800s             1000s        500s      1        2                   400

I can easily fetch all of the audit_tasks for a given user and roll them up in code, but I might be fetching hundreds of thousands of audit_tasks over a given period, so it needs to be done in SQL.

Just to be clear -- I'm looking for a query to generate one row per user, containing summary data collected across the other 3 tables. So, for each user, I want to know how much time he spent in each type of audit_period (3600 seconds processing, 3200 seconds shipping, etc), as well as how many of each audit_task he performed (5 scans, 10 items placed in pallet, etc).

I think I have the elements of a solution, I'm just having trouble piecing them together. I know exactly how I would accomplish this in Ruby/Java/etc, but I don't think I understand SQL well enough to know which tool I'm missing. Do I need a temp table? A union? Some other construct entirely?

Any help is greatly appreciated, and I can clarify if the above is complete nonsense.

like image 633
Kyle Avatar asked Mar 17 '11 05:03

Kyle


1 Answers

You will need to break this up into two crosstab queries which give you the information about audit_periods by user and another query that will give you the audit_task information by user and then join that to the Users table. It isn't clear how you want to roll up the information in each of the cases. For example, if a given user has 10 audit_period rows, how should the query roll up those durations? I assumed a sum of the durations here but you might want a min or max or perhaps even an overall delta.

Select U.user_id
    , AuditPeriodByUser.TotalDuration_Processing As processing
    , AuditPeriodByUser.TotalDuration_Shipping As shipping
    , AuditPeriodByUser.TotalDuration_Break As break
    , AuditTasksByUser.TotalCount_Scan As scan
    , AuditTasksByUser.TotalCount_Place_In_Pallet As place_in_pallet
    , AuditTasksByUser.TotalScore As score
From users As U
    Left Join   (
                Select AP.user_id
                    , Sum( Case When AP.period_type = 'processing' 
                                Then Time_To_Sec( 
                                        TimeDiff( 
                                            Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) ) 
                        As TotalDuration_Processing
                    , Sum( Case When AP.period_type = 'shipping' 
                                Then Time_To_Sec( 
                                        TimeDiff( 
                                            Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) ) 
                        As TotalDuration_Shipping
                    , Sum( Case When AP.period_type = 'break' 
                                Then Time_To_Sec( 
                                        TimeDiff( 
                                            Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) ) 
                        As TotalDuration_Break
                From audit_periods As AP
                Where AP.started_at >= @StartDate 
                    And AP.finished_at <= @EndDate
                Group by AP.user_id
                ) As AuditPeriodByUser
            On AuditPeriodByUser.user_id = U.user_id
    Left Join   (
                Select AP.user_id
                    , Sum( Case When AT.Name = 'scan' Then 1 Else 0 End ) As TotalCount_Scan
                    , Sum( Case When AT.Name = 'place_in_pallet' Then 1 Else 0 End ) As TotalCount_Place_In_Pallet
                    , Sum( AT.score ) As TotalScore
                From audit_tasks As AT
                    Join audit_task_types As ATT
                        On ATT.id = AT.audit_task_type_id
                    Join audit_periods As AP
                        On AP.audit_period_id = AP.id
                Where AP.started_at >= @StartDate 
                    And AP.finished_at <= @EndDate
                Group By AP.user_id
                ) As AuditTasksByUser
        On AuditTasksByUser.user_id = U.user_id
like image 109
Thomas Avatar answered Sep 29 '22 21:09

Thomas