Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average of rows where column = A within distinct rows on another column grouped by a third column

Tags:

sql

sql-server

Using SQL Server, I'm trying to query a kind of averaged count from a table I didn't design, where basically I want a list, grouped by one column, with the number of distinct values of another column matching a given criterion, and of those, the number of rows matching another criterion (which I'll use to created the averaged count or whatever it is). This can't be hard, but I'm having a bad set theory day and any pointers will be gratefully received.

Here's the simplified and genericized scenario (schema and sample data below). Say we have three columns:

  • objid (has a clustered index)
  • userid (no index, I might be able to add one)
  • actiontype (no index, I might be able to add one)

None of these is unique, and none can be null. We want to completely ignore any rows where actiontype is none. We want to know, per userid, how many actiontype = 'flag' rows there are on average per object that user has interacted with.

So if we have "ahmed", "joe", and "maria", and joe interacted with 3 objects and raised 5 flags, the number there is 5 / 3 = 1.6666 continuous; if "ahmed" interacted with 3 objects and didn't raise any flags, his number would be 0; if "maria" interacted with 5 objects and raised 4 flags, her number would be 4 / 5 = 0.8:

+--------+------------------+
| userid | flags_per_object |
+--------+------------------+
| ahmed  | 0                |
| joe    | 1.66666667       |
| maria  | 0.8              |
+--------+------------------+

I won't be remotely surprised if this is closed as a duplicate, I'm just not finding it.

Here's the simplified table setup and sample data:

create table tmp
(
    objid      varchar(254) not null,
    userid     varchar(254) not null,
    actiontype varchar(254) not null
)
create clustered index tmp_objid on tmp(objid)

insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'update')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'close')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')

insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'close')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'flag')

insert into tmp (objid, userid, actiontype) values ('gamma', 'joe', 'none')

insert into tmp (objid, userid, actiontype) values ('delta', 'joe', 'update')

insert into tmp (objid, userid, actiontype) values ('alpha', 'maria', 'update')

insert into tmp (objid, userid, actiontype) values ('beta', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('beta', 'maria', 'flag')

insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'update')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'close')

insert into tmp (objid, userid, actiontype) values ('delta', 'maria', 'update')
insert into tmp (objid, userid, actiontype) values ('epsilon', 'maria', 'update')

insert into tmp (objid, userid, actiontype) values ('alpha', 'ahmed', 'none')

insert into tmp (objid, userid, actiontype) values ('beta', 'ahmed', 'none')

insert into tmp (objid, userid, actiontype) values ('gamma', 'ahmed', 'none')
insert into tmp (objid, userid, actiontype) values ('gamma', 'ahmed', 'update')

insert into tmp (objid, userid, actiontype) values ('delta', 'ahmed', 'update')
insert into tmp (objid, userid, actiontype) values ('delta', 'ahmed', 'close')

insert into tmp (objid, userid, actiontype) values ('epsilon', 'ahmed', 'update')
insert into tmp (objid, userid, actiontype) values ('epsilon', 'ahmed', 'close')
like image 508
T.J. Crowder Avatar asked Jun 04 '11 14:06

T.J. Crowder


People also ask

How do you find the average of specific rows in SQL?

The AVG function finds the arithmetic mean for a group of records in a SQL table. An average, or arithmetic mean, is the sum of a group of numbers divided by the count for that group. For example, 2+4+4+6+6+8 is 30 divided 6 which results in an average of 5.

Does distinct work on multiple columns?

Yes, DISTINCT works on all combinations of column values for all columns in the SELECT clause.

Can distinct be used with where clause?

By using the WHERE clause with a DISTINCT clause in MySQL queries, we are putting a condition on the basis of which MySQL returns the unique rows of the result set.

How do you find the average of a specific column in SQL?

The AVG() function returns the average value of a numeric column.


2 Answers

You can try the following :

select  t1.userid,
CASE cnt2 
WHEN 0 THEN 0
ELSE ISNULL(cast(cnt2 as float)/cnt1,0)
END as num
FROM
(
  select userid, COUNT(distinct(t1.objid)) as cnt2
  from tmp as t1
  where t1.actiontype <> 'none'
  group by t1.userid
) t1

LEFT JOIN (
SELECT t2.userid, COUNT(*) as cnt1
FROM tmp as t2
WHERE t2.actiontype='flag'
GROUP BY t2.userid)b ON (b.userid = t1.userid)

Even though it looks uglier than your solution, it surprisingly generates a better execution plan based on test data you provided.

like image 61
a1ex07 Avatar answered Sep 22 '22 02:09

a1ex07


(Answering my own question.)

I do have something that works:

select   userid,
         cast(count(case when actiontype = 'flag' then 1 else null end) as float)
         /
         count(distinct(objid))
         as flags_per_object
from     tmp
where    actiontype <> 'none'
group by userid

....but I can't help feeling there's a better way...

like image 23
T.J. Crowder Avatar answered Sep 18 '22 02:09

T.J. Crowder