Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the longest sequence of consecutive increasing numbers in SQL

Tags:

sql

sql-server

For this example say I have a table with two fields, AREA varchar(30) and OrderNumber INT.

The table has the following data

AREA      | OrderNumber
Fontana   |       32
Fontana   |       42
Fontana   |       76
Fontana   |       12
Fontana   |        3
Fontana   |       99
RC        |       32
RC        |        1
RC        |        8
RC        |        9
RC        |        4

I would like to return

The results I would like to return is for each area the longest length of increasing consecutive values. For Fontana it is 3 (32, 42, 76). For RC it is 2 (8,9)

AREA    | LongestLength
Fontana |          3
RC      |          2

How would I do this on MS Sql 2005?

like image 747
Pat Rick Allen Avatar asked Mar 12 '13 18:03

Pat Rick Allen


People also ask

Which algorithm is used to find the longest consecutive subsequence?

The idea is to use Hashing. We first insert all elements in a Set. Then check all the possible starts of consecutive subsequences.

How do you find the longest array sequence?

Find the longest running positive sequence in an array. Examples: Input : arr[] = {1, 2, -3, 2, 3, 4, -6, 1, 2, 3, 4, 5, -8, 5, 6} Output :Index : 7, length : 5 Input : arr[] = {-3, -6, -1, -3, -8} Output : No positive sequence detected.


2 Answers

One way is to use a recursive CTE that steps over each row. If the row meets the criteria (increasing order number for the same area), you increase the chain length by one. If it doesn't, you start a new chain:

; with  numbered as
        (
        select  row_number() over (order by area, eventtime) rn
        ,       *
        from    Table1
        )
,       recurse as
        (
        select  rn
        ,       area
        ,       OrderNumber
        ,       1 as ChainLength
        from    numbered
        where   rn = 1
        union all
        select  cur.rn
        ,       cur.area
        ,       cur.OrderNumber
        ,       case
                when cur.area = prev.area 
                     and cur.OrderNumber > prev.OrderNumber 
                     then prev.ChainLength + 1
                else 1
                end
        from    recurse prev
        join    numbered cur
        on      prev.rn + 1 = cur.rn
        )
select  area
,       max(ChainLength)
from    recurse
group by
        area

Live example at SQL Fiddle.

An alternative way is to use a query to find "breaks", that is, rows that end a sequence of increasing order numbers for the same area. The number of rows between breaks is the length.

; with  numbered as
        (
        select  row_number() over (order by area, eventtime) rn
        ,       *
        from    Table1 t1
        )
        -- Select rows that break an increasing chain
,       breaks as
        (
        select  row_number() over (order by cur.rn) rn2
        ,       cur.rn
        ,       cur.Area
        from    numbered cur
        left join
                numbered prev
        on      cur.rn = prev.rn + 1
        where   cur.OrderNumber <= prev.OrderNumber
                or cur.Area <> prev.Area
                or prev.Area is null
        )
        -- Add a final break after the last row
,       breaks2 as
        (
        select  *
        from    breaks
        union all
        select  count(*) + 1
        ,       max(rn) + 1
        ,       null
        from    breaks
        )
select  series_start.area
,       max(series_end.rn - series_start.rn)
from    breaks2 series_start
join    breaks2 series_end
on      series_end.rn2 = series_start.rn2 + 1
group by
        series_start.area

Live example at SQL Fiddle.

like image 158
Andomar Avatar answered Oct 20 '22 00:10

Andomar


You do not explain why RC's longest sequence does not include 1 while Fontana's does include 32. I take it, the 1 is excluded because it is a decrease: it comes after 32. The Fontana's 32, however, is the first ever item in the group, and I've got two ideas how to explain why it is considered an increase. That's either exactly because it's the group's first item or because it is also positive (i.e. as if coming after 0 and, therefore, an increase).

For the purpose of this answer, I'm assuming the latter, i.e. a group's first item is an increase if it is positive. The below script implements the following idea:

  1. Enumerate the rows in every AREA group in the order of the eventtime column you nearly forgot to mention.

  2. Join the enumerated set to itself to link every row with it's predecessor.

  3. Get the sign of the difference between the row and its preceding value (defaulting the latter to 0). At this point the problem turns into a gaps-and-islands one.

  4. Partition every AREA group by the signs determined in #3 and enumerate every subgroup's rows.

  5. Find the difference between the row numbers from #1 and those found in #4. That would be a criterion to identify individual streaks (together with AREA).

  6. Finally, group the results by AREA, the sign from #3 and the result from #5, count the rows and get the maximum count per AREA.

I implemented the above like this:

WITH enumerated AS (
  SELECT
    *,
    row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
  FROM atable
),
signed AS (
  SELECT
    this.eventtime,
    this.AREA,
    this.row,
    sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
  FROM      enumerated AS this
  LEFT JOIN enumerated AS last
    ON this.AREA = last.AREA
   AND this.row  = last.row + 1
),
partitioned AS (
  SELECT
    AREA,
    sgn,
    grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
  FROM signed
)
SELECT DISTINCT
  AREA,
  LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
FROM partitioned
WHERE sgn = 1
GROUP BY
  AREA,
  grp
;

A SQL Fiddle demo can be found here.

like image 32
Andriy M Avatar answered Oct 19 '22 23:10

Andriy M