I'm looking for help using sum() in my SQL query:
SELECT links.id, count(DISTINCT stats.id) as clicks, count(DISTINCT conversions.id) as conversions, sum(conversions.value) as conversion_value FROM links LEFT OUTER JOIN stats ON links.id = stats.parent_id LEFT OUTER JOIN conversions ON links.id = conversions.link_id GROUP BY links.id ORDER BY links.created desc;
I use DISTINCT
because I'm doing "group by" and this ensures the same row is not counted more than once.
The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)
I basically want to do SUM(conversions.value)
for each DISTINCT conversions.id.
Is that possible?
SELECT links.id, count(DISTINCT stats.id) as clicks, count(DISTINCT conversions.id) as conversions, sum(conversions. value) as conversion_value FROM links LEFT OUTER JOIN stats ON links.id = stats. parent_id LEFT OUTER JOIN conversions ON links.id = conversions. link_id GROUP BY links.id ORDER BY links.
MySQL COUNT(DISTINCT) function returns a count of number rows with different non-NULL expr values. Where expr is a given expression. The following MySQL statement will count the unique 'pub_lang' and average of 'no_page' up to 2 decimal places for each group of 'cate_id'.
I may be wrong but from what I understand
Thus for each conversions.id you have at most one links.id impacted.
You request is a bit like doing the cartesian product of 2 sets :
[clicks] SELECT * FROM links LEFT OUTER JOIN stats ON links.id = stats.parent_id [conversions] SELECT * FROM links LEFT OUTER JOIN conversions ON links.id = conversions.link_id
and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines
As you noted the number of unique conversions in your request can be obtained via a
count(distinct conversions.id) = sizeof([conversions])
this distinct manages to remove all the [clicks] lines in the cartesian product
but clearly
sum(conversions.value) = sum([conversions].value) * sizeof([clicks])
In your case, since
count(*) = sizeof([clicks]) x sizeof([conversions]) count(*) = sizeof([clicks]) x count(distinct conversions.id)
you have
sizeof([clicks]) = count(*)/count(distinct conversions.id)
so I would test your request with
SELECT links.id, count(DISTINCT stats.id) as clicks, count(DISTINCT conversions.id) as conversions, sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value FROM links LEFT OUTER JOIN stats ON links.id = stats.parent_id LEFT OUTER JOIN conversions ON links.id = conversions.link_id GROUP BY links.id ORDER BY links.created desc;
Keep me posted ! Jerome
Jeromes solution is actually wrong and can produce incorrect results!!
sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value
let's assume the following table
conversions id value 1 5 1 5 1 5 2 2 3 1
the correct sum of value for distinct ids would be 8. Jerome's formula produces:
sum(conversions.value) = 18 count(distinct conversions.id) = 3 count(*) = 5 18*3/5 = 9.6 != 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With