Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a trend line from data set SQL

The code below returns the number of resolved tickets and the number of opened tickets for a period (period is YYYY,WW) going back a certain number of days. For example if @NoOfDays is 7:

resolved | opened | week | year | period

56 | 30 | 13 | 2012 | 2012, 13

237 | 222 | 14 | 2012 | 2012, 14

'resolved' and 'opened' are graphed on lines (y) over period (x). I would like to add another column 'trend' that would return a number that when graphed over period, will be a trend line (simple linear regression). I do want to use both sets of values as one data source for the trend.

This is the code I have:

SELECT a.resolved, b.opened, a.weekClosed AS week, a.yearClosed AS year,
    CAST(a.yearClosed as varchar(5)) + ', ' + CAST(a.weekClosed as varchar(5)) AS period
FROM 
    (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed
    FROM v_rpt_Service
    WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0))
    GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a 
LEFT OUTER JOIN
    (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) 
    } AS yearEntered
    FROM v_rpt_Service AS v_rpt_Service_1
    WHERE        (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0))
    GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered
ORDER BY year, week

Edit:

According to serc.carleton.edu/files/mathyouneed/best_fit_line_dividing.pdf, it seems that I want to break the data in half, then calculate the average. Then I need to find the best fit line, and use the slope and the y-intercept to calculate the values needed to return in 'trend' using y = mx + b?

I know this is very possible in SQL, however, the program I am inserting the SQL into has limitations on what I can do.

The red and blue dots are the numbers I am returning now(opened and resolved). I need to return a value for every period in 'trend' in order to create the purple line. (this image is hypothetical)

Hypothetical Chart

like image 755
PRNDL Development Studios Avatar asked Apr 06 '12 17:04

PRNDL Development Studios


1 Answers

I was interested in the problem, and I have found that the best way to grok a complex query is to reformat it using my own style and conventions. I applied them to your solution, and the result is below. I've no idea if this will have any value to you...

  • There were a few bits of code that I do not believe are part of the MS T-SQL syntax, such as ({fn xxx } and the WEEK(xxx) function.
  • This code compiles, but I can't run it as I don't have a data table properly configured.
  • I made a host of coding changes that would take a serious lot of explaining, and I'm going to skip most of that. Add a comment if you'd like anything explicated.
  • I tossed in a lot of whitespace. The difference between legible and illegible codes is often just the perception and sensibilities of the beholder, and you might hate my conventions.
  • Not sure what the final result set should be (i.e. which columns get returned)

Some further notes:

  • This query will not get items entered in a week if no items were also closed in that week
  • Weeks may be partial, e.g. not all seven days may be present (adjust @Interval to always including full weeks -- but what about odd numbered intervals?)
  • Multiply the count(*) values by 1.0 to convert them to floats early (avoids cast and integer math truncation)
  • Made it a cte to allow the earlier formulas to be replaced by symbols in the later formulas (at which point things became a lot more legible)

So here's what I came up with:

;WITH cte as (
select
   c.period
  ,resolved_half1
  ,resolved_half2
  ,opened_half1
  ,opened_half2
  ,row = row_number() over(order by c.yearClosed, c.weekClosed)
  ,y1 = ((SUM(resolved_half1) + SUM(opened_half1)) - (SUM(resolved_half2) + SUM(opened_half2))) / ((count(resolved_half1) + count(opened_half1)) / 2)
  ,y2 = ((SUM(resolved_half2) + SUM(opened_half2)) / (count(resolved_half2) + COUNT (opened_half2)))
  ,x1 = ((count(c.period)) / 4)
  ,x2 = (((count(c.period)) / 4) * 3)
 from (select
          a.yearclosed
         ,a.weekClosed
         ,a.resolved_half1
         ,b.yearEntered
         ,b.weekEntered
         ,b.opened_half1
         ,cast(a.yearClosed as varchar(5)) + ', ' + cast(a.weekClosed as varchar(5))  period 
        from (--  Number of items per week that closed within @Interval
              select
                 count(distinct TicketNbr) * 1.0  resolved_half1
                ,datepart(wk, date_closed)        weekClosed
                ,year(date_closed)                yearClosed
               from v_rpt_Service 
               where date_closed >= @FullInterval
               group by
                 datepart(wk, date_closed)
                ,year(date_closed) )  a
         left outer join (--  Number of items per week that were entered within @Interval
                          select 
                             count(distinct TicketNbr) * 1.0  opened_half1
                            ,datepart(wk, date_entered)       weekEntered
                            ,year(date_entered)               yearEntered
                           from v_rpt_Service
                           where date_entered >= @FullInterval
                           group by
                             datepart(wk, date_entered)
                            ,year(date_entered) )  b
          on a.weekClosed = b.weekEntered 
           and a.yearClosed = b.yearEntered)  c
  left outer join (select
                       d.yearclosed
                      ,d.weekClosed
                      ,d.resolved_half2
                      ,e.yearEntered
                      ,e.weekEntered
                      ,e.opened_half2
                      ,cast(yearClosed as varchar(5)) + ', ' + cast(weekClosed as varchar(5))  period 
                    from (select
                             count(distinct TicketNbr) * 1.0  resolved_half2
                            ,datepart(wk, date_closed)        weekClosed
                            ,year(date_closed)                yearClosed
                           from v_rpt_Service
                           where date_closed >= @HalfInterval
                           group by
                             datepart(wk, date_closed) 
                            ,year(date_closed) )  d 
                     left outer join (select
                                         count(distinct TicketNbr) * 1.0  opened_half2
                                        ,datepart(wk, date_entered)       weekEntered
                                        ,year(date_entered)               yearEntered
                                       from v_rpt_Service
                                       where date_entered >= @HalfInterval
                                       group by
                                           datepart(wk, date_entered) 
                                          ,year(date_entered) )  e
                      on d.weekClosed = e.weekEntered
                       and d.yearClosed = e.yearEntered )  f
   on c.period = f.period 
 group by
   c.period
  ,resolved_half1
  ,resolved_half2
  ,opened_half1
  ,opened_half2
  ,c.yearClosed
  ,c.weekClosed
)
SELECT
   row
  ,Period
  ,x1
  ,y1
  ,x2
  ,y2
  ,m = ((y1 - y2) / (x1 - x2))
  ,b = (y2 - (((y1 - y2) / (x1 - x2)) * x2))
  ,trend = ((((y1 - y2) / (x1 - x2)) * (row)) + (y2 - (((y1 - y2) / (x1 - x2)) * x2)))
 from cte
 order by row 

As an addenda, all of subquery "c" could be replaced with something like the following, and "f" with a slightly modified version. Better or worse performance depends on table size, indexing, and other imponderables.

select
   datepart(wk, date_closed)  weekClosed
  ,year(date_closed)          yearClosed
  ,count (distinct case
                  when date_closed >= @FullInterval then TicketNbr
                  else null
                end)          resolved_half1
  ,count (distinct case
                  when date_entered >= @FullInterval then TicketNbr
                  else null
                end)          opened_half1
 from v_rpt_Service 
 where date_closed >= @FullInterval
  or date_entered >= @FullInterval
 group by
   datepart(wk, date_closed)
  ,year(date_closed) 
like image 83
Philip Kelley Avatar answered Oct 07 '22 01:10

Philip Kelley