Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MYSQL sum() for distinct rows

Tags:

mysql

sum

I'm looking for help using sum() in my SQL query:

SELECT links.id,         count(DISTINCT stats.id) as clicks,         count(DISTINCT conversions.id) as conversions,         sum(conversions.value) as conversion_value  FROM links  LEFT OUTER JOIN stats ON links.id = stats.parent_id  LEFT OUTER JOIN conversions ON links.id = conversions.link_id  GROUP BY links.id  ORDER BY links.created desc; 

I use DISTINCT because I'm doing "group by" and this ensures the same row is not counted more than once.

The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)

I basically want to do SUM(conversions.value) for each DISTINCT conversions.id.

Is that possible?

like image 240
makeee Avatar asked Mar 12 '10 22:03

makeee


People also ask

How use distinct and sum together in MySQL?

SELECT links.id, count(DISTINCT stats.id) as clicks, count(DISTINCT conversions.id) as conversions, sum(conversions. value) as conversion_value FROM links LEFT OUTER JOIN stats ON links.id = stats. parent_id LEFT OUTER JOIN conversions ON links.id = conversions. link_id GROUP BY links.id ORDER BY links.

How do I count distinct rows in MySQL?

MySQL COUNT(DISTINCT) function returns a count of number rows with different non-NULL expr values. Where expr is a given expression. The following MySQL statement will count the unique 'pub_lang' and average of 'no_page' up to 2 decimal places for each group of 'cate_id'.


2 Answers

I may be wrong but from what I understand

  • conversions.id is the primary key of your table conversions
  • stats.id is the primary key of your table stats

Thus for each conversions.id you have at most one links.id impacted.

You request is a bit like doing the cartesian product of 2 sets :

[clicks] SELECT * FROM links  LEFT OUTER JOIN stats ON links.id = stats.parent_id   [conversions] SELECT * FROM links  LEFT OUTER JOIN conversions ON links.id = conversions.link_id  

and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines

As you noted the number of unique conversions in your request can be obtained via a

count(distinct conversions.id) = sizeof([conversions]) 

this distinct manages to remove all the [clicks] lines in the cartesian product

but clearly

sum(conversions.value) = sum([conversions].value) * sizeof([clicks]) 

In your case, since

count(*) = sizeof([clicks]) x sizeof([conversions]) count(*) = sizeof([clicks]) x count(distinct conversions.id) 

you have

sizeof([clicks]) = count(*)/count(distinct conversions.id) 

so I would test your request with

SELECT links.id,     count(DISTINCT stats.id) as clicks,     count(DISTINCT conversions.id) as conversions,     sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value  FROM links  LEFT OUTER JOIN stats ON links.id = stats.parent_id  LEFT OUTER JOIN conversions ON links.id = conversions.link_id  GROUP BY links.id  ORDER BY links.created desc; 

Keep me posted ! Jerome

like image 60
Jerome WAGNER Avatar answered Sep 18 '22 14:09

Jerome WAGNER


Jeromes solution is actually wrong and can produce incorrect results!!

sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value 

let's assume the following table

conversions id value 1 5 1 5 1 5 2 2 3 1 

the correct sum of value for distinct ids would be 8. Jerome's formula produces:

sum(conversions.value) = 18 count(distinct conversions.id) = 3 count(*) = 5 18*3/5 = 9.6 != 8 
like image 45
Clemens Valiente Avatar answered Sep 18 '22 14:09

Clemens Valiente