Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query for run-length, or consecutive identical value encoding

My goal is to take a set of data that is ordered by id and return a resultset that indicates the number of consecutive rows where the val column is identical. E.g. given this data:

| id | val |
|  1 |  33 |
|  2 |  33 |
|  3 |  44 |
|  4 |  28 |
|  5 |  44 |
|  6 |  44 |

I would like to see this result:

| id | val | run_length |
| 1  | 33  | 2          |
| 3  | 44  | 1          |
| 4  | 28  | 1          |
| 5  | 44  | 2          |

The id column in the resultset is optional. In fact, if it makes it significantly harder, then just leave that column out of the result. I sort of like having it because it "pins" the resultset to a particular location in the table.

I am primarily interested in the result in free database engines. My order of preference for a solution is:

  1. SQLite
  2. Postgres
  3. MySQL
  4. Oracle
  5. SQL Server
  6. Sybase
like image 544
Terrence Brannon Avatar asked Jun 14 '15 13:06

Terrence Brannon


1 Answers

I'll choose #2 on your list, because this is incredibly painful to do in SQLite with a single query. The following is standard SQL:

select min(id), val, count(*) as runlength
from (select t.*,
             (row_number() over (order by id) -
              row_number() over (partition by val order by id)
             ) as grp
      from data t
     ) t
group by grp, val;

This uses the difference of two row number calculations to identify the seuqnces of identical values. It should work in the recent versions of databases 2, 4, 5, and 6.

like image 176
Gordon Linoff Avatar answered Oct 11 '22 22:10

Gordon Linoff