I have a data-analysis question, that I could easily solve with some T-SQL or some scripting, but I was wondering if there was a clever SQL solution. The problem is that it messes a bit with SQL's row-independence assumption a bit.
I have a table that consists of name-value pairs associated with a user and ordered by submission, for example:
ID USERID VARIABLE VALUE SUBMITTED 3115 2287 votech05 2 2009-02-02 15:34:00 3116 2287 comcol05 1 2009-02-02 15:34:00 3117 2287 fouryr05 1 2009-02-02 15:35:00 3118 2287 none05 2 2009-02-02 15:35:00 3119 2287 ocol1_05 2 2009-02-02 15:44:00 3120 2287 disnone 2 2009-02-02 15:45:00 3121 2287 dissense 2 2009-02-02 15:49:00 3122 2287 dismobil 3 2009-02-02 15:51:00 3123 2287 dislearn 3 2009-02-02 15:51:00 3124 2287 disment 3 2009-02-02 15:52:00 3125 2287 disother 2 2009-02-02 15:55:00 3126 2287 disrefus 7 2009-02-02 15:58:00
I'd like to be able to determine the value and count of the largest group of identical values (when the data is ordered the ID primary key). So, for the above example, because I have four value=2 appearing in sequence, and only three value=3, I would want to report:
USERID VALUE COUNT 2287 2 4
for the given user.
Again, this would could be done fairly-quickly using other tools, but since the data set is quite large (about 75 million records) and frequently changing, it would be nice to be able to solve this problem with a query. I'm working with SQL Server 2005.
(Edited after comment)
You can do that by assigning a "head" number to each group of consecutive values. After that you select the head number for each row, and do an aggregate per head.
Here's an example, with CTE's for readability:
WITH
OrderedTable as (
select value, rownr = row_number() over (order by userid, id)
from YourTable
where userid = 2287
),
Heads as (
select cur.rownr, CurValue = cur.value
, headnr = row_number() over (order by cur.rownr)
from OrderedTable cur
left join OrderedTable prev on cur.rownr = prev.rownr+1
where IsNull(prev.value,-1) != cur.value
),
ValuesWithHead as (
select value
, HeadNr = (select max(headnr)
from Heads
where Heads.rownr <= data.rownr)
from OrderedTable data
)
select Value, [Count] = count(*)
from ValuesWithHead
group by HeadNr, value
order by count(*) desc
This will output:
Value Count
2 4
3 3
1 2
2 1
2 1
7 1
Use "top 1" to select the first row only.
Here's my query to create the test data:
create table YourTable (
id int primary key,
userid int,
variable varchar(25),
value int
)
insert into YourTable (id, userid, variable, value) values (3115, 2287, 'votech05', 2)
insert into YourTable (id, userid, variable, value) values (3116, 2287, 'comcol05', 1)
insert into YourTable (id, userid, variable, value) values (3117, 2287, 'fouryr05', 1)
insert into YourTable (id, userid, variable, value) values (3118, 2287, 'none05', 2)
insert into YourTable (id, userid, variable, value) values (3119, 2287, 'ocol1_05', 2)
insert into YourTable (id, userid, variable, value) values (3120, 2287, 'disnone', 2)
insert into YourTable (id, userid, variable, value) values (3121, 2287, 'dissense', 2)
insert into YourTable (id, userid, variable, value) values (3122, 2287, 'dismobil', 3)
insert into YourTable (id, userid, variable, value) values (3123, 2287, 'dislearn', 3)
insert into YourTable (id, userid, variable, value) values (3124, 2287, 'disment', 3)
insert into YourTable (id, userid, variable, value) values (3125, 2287, 'disother', 2)
insert into YourTable (id, userid, variable, value) values (3126, 2287, 'disrefus', 7)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With