I'm trying to calculate the absolute deviation of a vector online, that is, as each item in the vector is received, without using the entire vector. The absolute deviation is the sum of the absolute difference between each item in a vector and the mean:
I know that the variance of a vector can be calculated in such a manner. Variance is similar to absolute deviation, but each difference is squared:
The online algorithm for variance is as follows:
n = 0
mean = 0
M2 = 0
def calculate_online_variance(x):
n = n + 1
delta = x - mean
mean = mean + delta/n
M2 = M2 + delta*(x - mean) # This expression uses the new value of mean
variance_n = M2/n
return variance_n
Is there such an algorithm for calculating absolute deviance? I cannot formulate a recursive definition myself, but wiser heads may prevail!
To find the mean absolute deviation of the data, start by finding the mean of the data set. Find the sum of the data values, and divide the sum by the number of data values. Find the absolute value of the difference between each data value and the mean: |data value – mean|.
And the answer for this is 12.75.
As the absolute deviation between x and the mean can be defined as the square root of the squared difference, the adaptation is trivial if you are happy with a consistent but biased estimate (meaning the limit to infinity is the expected value) :
n = 0
mean = 0
M2 = 0
def calculate_online_avg_abs_dev(x):
n = n + 1
delta = x - mean
mean = mean + delta/n
M2 = M2 + sqrt(delta*(x - mean))
avg_abs_dev_n = M2/n
This is for the case of the average absolute deviation. Normally the mad is used (median absolute deviation), which is impossible to program recursively. but the average absolute deviation is as useful in most cases. When we're talking about hundreds of values from close-to-normal distributions, both values are very close.
If you just want the sum of the absolute devations, life is even simpler: just return M2.
Be aware of the fact that BOTH the algorithm you gave and the trivial adaptation for the absolute deviation are slightly biased.
A simulation in R to prove the algorithm works this way :
The red line is the true value, the black line is the progressive value following the algorithm outlined above.
Code :
calculate_online_abs_dev <- function(x,n){
M2=0
mean=0
out <- numeric(n)
for(i in 1:n) {
delta <- x[i] - mean
mean <- mean + delta/i
M2 = M2 + sqrt(delta*(x[i] - mean))
out[i] <- M2/i
}
return(out)
}
set.seed(2010)
x <- rnorm(100)
Abs_Dev <- calculate_online_abs_dev(x,length(x))
True_Val <- sapply(1:length(x),function(i)sum(abs(x[1:i]-mean(x[1:i])))/i)
plot(1:length(x),Abs_Dev,type="l",xlab="number of values",lwd=2)
lines(1:length(x),True_Val,col="red",lty=2,lwd=2)
legend("bottomright",lty=c(1,2),col=c("black","red"),
legend=c("Online Calc","True Value"))
I don't think it's possible.
In the formula for variance it is possible to separate the x and x2 terms, so that it suffices to keep track of those sums (and n). In the formula for absolute deviation this is not possible.
I think the best one can do (apart from keeping the whole vector and calculating the absolute deviation on demand) is keep a sorted list of elements. This is O(log(n)) for each new element, but after you add an element the cost of recalculating the absolute deviation is O(log(n)). This may or may not be worthwhile, depending on your application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With