Using SQL Server, I'm trying to query a kind of averaged count from a table I didn't design, where basically I want a list, grouped by one column, with the number of distinct values of another column matching a given criterion, and of those, the number of rows matching another criterion (which I'll use to created the averaged count or whatever it is). This can't be hard, but I'm having a bad set theory day and any pointers will be gratefully received.
Here's the simplified and genericized scenario (schema and sample data below). Say we have three columns:
objid
(has a clustered index)
userid
(no index, I might be able to add one)
actiontype
(no index, I might be able to add one)
None of these is unique, and none can be null
. We want to completely ignore any rows where actiontype
is none
. We want to know, per userid
, how many actiontype = 'flag'
rows there are on average per object that user has interacted with.
So if we have "ahmed", "joe", and "maria", and joe interacted with 3 objects and raised 5 flags, the number there is 5 / 3 = 1.6666
continuous; if "ahmed" interacted with 3 objects and didn't raise any flags, his number would be 0
; if "maria" interacted with 5 objects and raised 4 flags, her number would be 4 / 5 = 0.8
:
+--------+------------------+ | userid | flags_per_object | +--------+------------------+ | ahmed | 0 | | joe | 1.66666667 | | maria | 0.8 | +--------+------------------+
I won't be remotely surprised if this is closed as a duplicate, I'm just not finding it.
Here's the simplified table setup and sample data:
create table tmp
(
objid varchar(254) not null,
userid varchar(254) not null,
actiontype varchar(254) not null
)
create clustered index tmp_objid on tmp(objid)
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'update')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'close')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'close')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('gamma', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('delta', 'joe', 'update')
insert into tmp (objid, userid, actiontype) values ('alpha', 'maria', 'update')
insert into tmp (objid, userid, actiontype) values ('beta', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('beta', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'update')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'close')
insert into tmp (objid, userid, actiontype) values ('delta', 'maria', 'update')
insert into tmp (objid, userid, actiontype) values ('epsilon', 'maria', 'update')
insert into tmp (objid, userid, actiontype) values ('alpha', 'ahmed', 'none')
insert into tmp (objid, userid, actiontype) values ('beta', 'ahmed', 'none')
insert into tmp (objid, userid, actiontype) values ('gamma', 'ahmed', 'none')
insert into tmp (objid, userid, actiontype) values ('gamma', 'ahmed', 'update')
insert into tmp (objid, userid, actiontype) values ('delta', 'ahmed', 'update')
insert into tmp (objid, userid, actiontype) values ('delta', 'ahmed', 'close')
insert into tmp (objid, userid, actiontype) values ('epsilon', 'ahmed', 'update')
insert into tmp (objid, userid, actiontype) values ('epsilon', 'ahmed', 'close')
The AVG function finds the arithmetic mean for a group of records in a SQL table. An average, or arithmetic mean, is the sum of a group of numbers divided by the count for that group. For example, 2+4+4+6+6+8 is 30 divided 6 which results in an average of 5.
Yes, DISTINCT works on all combinations of column values for all columns in the SELECT clause.
By using the WHERE clause with a DISTINCT clause in MySQL queries, we are putting a condition on the basis of which MySQL returns the unique rows of the result set.
The AVG() function returns the average value of a numeric column.
You can try the following :
select t1.userid,
CASE cnt2
WHEN 0 THEN 0
ELSE ISNULL(cast(cnt2 as float)/cnt1,0)
END as num
FROM
(
select userid, COUNT(distinct(t1.objid)) as cnt2
from tmp as t1
where t1.actiontype <> 'none'
group by t1.userid
) t1
LEFT JOIN (
SELECT t2.userid, COUNT(*) as cnt1
FROM tmp as t2
WHERE t2.actiontype='flag'
GROUP BY t2.userid)b ON (b.userid = t1.userid)
Even though it looks uglier than your solution, it surprisingly generates a better execution plan based on test data you provided.
(Answering my own question.)
I do have something that works:
select userid,
cast(count(case when actiontype = 'flag' then 1 else null end) as float)
/
count(distinct(objid))
as flags_per_object
from tmp
where actiontype <> 'none'
group by userid
....but I can't help feeling there's a better way...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With