Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql: avg vs sum/count

I have 4 variables:

  • amdt_adopt
  • com_amdt_adopt
  • amdt_tabled
  • com_amdt_tabled

I want to compute the average of the ratio:

(amdt_adopt - com_amdt_adopt) / (amdt_tabled - com_amdt_tabled)

If I use the built-in avg function:

select avg((amdt_adopt - com_amdt_adopt) / (amdt_tabled - com_amdt_tabled)) as final_res
from europolix.act_act
where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0;

I get 0.44771942.

If I use a sum and then divides by count:

select 
(
    select sum((amdt_adopt - com_amdt_adopt) / (amdt_tabled - com_amdt_tabled))
    from europolix.act_act
    where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0
) 
/ 
(
    select count(*) 
    from europolix.act_act
    where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0
) as final_res;

I get 0.20883803.

I have different results but these two queries are supposed to be equivalent! What's wrong?

EDIT

Cast to decimal as suggested:

select avg((cast(amdt_adopt - com_amdt_adopt as decimal(8,3)))  / (cast(amdt_tabled - com_amdt_tabled as decimal(8,3)))) as final_res
from europolix.act_act
where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0;

select 
(
    select sum((cast(amdt_adopt - com_amdt_adopt as decimal(8,3))) / (cast(amdt_tabled - com_amdt_tabled as decimal(8,3))))
    from europolix.act_act
    where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0
) 
/ 
(
    select count(*) 
    from europolix.act_act
    where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0
) as final_res;

Still same result :(.

like image 424
rom Avatar asked Mar 17 '23 15:03

rom


1 Answers

If you have NULL values in records for some of those columns that you use in avg, the results of avg or sum/count might easily differ.

E.g.:

create table t (id int, x int, y int);

insert into t (id, x, y) values (1, 1, 1);
insert into t (id, x, y) values (2, 2, 3);
insert into t (id, x, y) values (3, 3, 4);
insert into t (id, x, y) values (4, NULL, NULL);


select avg(x), sum(x), count(x), count(*), sum(x)/count(*) from t;

Will give you:

AVG(X)  SUM(X)  COUNT(X)    COUNT(*)    SUM(X)/COUNT(*)
2       6       3           4           1.5

Similarly, AVG won't consider where a division by zero occurs.

E.g.

select avg(x/(y-x)) from t where x>0 and y>0;
2.5

select sum(x/(y-x))/count(*) from t where x>0 and y>0;
1.66666667

One solution might be to use avg with a condition like this:

select avg(if(y-x=0, 0, x/(y-x))) from t where x>0 and y>0;
1.66666667

An example can be seen here

[edit] updated to consider the possibility of division by zero

like image 193
lp_ Avatar answered Mar 23 '23 09:03

lp_