I have 4 variables:
I want to compute the average of the ratio:
(amdt_adopt - com_amdt_adopt) / (amdt_tabled - com_amdt_tabled)
If I use the built-in avg
function:
select avg((amdt_adopt - com_amdt_adopt) / (amdt_tabled - com_amdt_tabled)) as final_res
from europolix.act_act
where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0;
I get 0.44771942
.
If I use a sum
and then divides by count
:
select
(
select sum((amdt_adopt - com_amdt_adopt) / (amdt_tabled - com_amdt_tabled))
from europolix.act_act
where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0
)
/
(
select count(*)
from europolix.act_act
where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0
) as final_res;
I get 0.20883803
.
I have different results but these two queries are supposed to be equivalent! What's wrong?
Cast to decimal
as suggested:
select avg((cast(amdt_adopt - com_amdt_adopt as decimal(8,3))) / (cast(amdt_tabled - com_amdt_tabled as decimal(8,3)))) as final_res
from europolix.act_act
where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0;
select
(
select sum((cast(amdt_adopt - com_amdt_adopt as decimal(8,3))) / (cast(amdt_tabled - com_amdt_tabled as decimal(8,3))))
from europolix.act_act
where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0
)
/
(
select count(*)
from europolix.act_act
where validated=2 and com_amdt_adopt>0 and amdt_adopt>0 and com_amdt_tabled>0 and amdt_tabled>0
) as final_res;
Still same result :(.
If you have NULL values in records for some of those columns that you use in avg, the results of avg or sum/count might easily differ.
E.g.:
create table t (id int, x int, y int);
insert into t (id, x, y) values (1, 1, 1);
insert into t (id, x, y) values (2, 2, 3);
insert into t (id, x, y) values (3, 3, 4);
insert into t (id, x, y) values (4, NULL, NULL);
select avg(x), sum(x), count(x), count(*), sum(x)/count(*) from t;
Will give you:
AVG(X) SUM(X) COUNT(X) COUNT(*) SUM(X)/COUNT(*)
2 6 3 4 1.5
Similarly, AVG won't consider where a division by zero occurs.
E.g.
select avg(x/(y-x)) from t where x>0 and y>0;
2.5
select sum(x/(y-x))/count(*) from t where x>0 and y>0;
1.66666667
One solution might be to use avg with a condition like this:
select avg(if(y-x=0, 0, x/(y-x))) from t where x>0 and y>0;
1.66666667
An example can be seen here
[edit] updated to consider the possibility of division by zero
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With