I have a dataframe storing different values. Sample:
a$open a$high a$low a$close
1.08648 1.08707 1.08476 1.08551
1.08552 1.08623 1.08426 1.08542
1.08542 1.08572 1.08453 1.08465
1.08468 1.08566 1.08402 1.08554
1.08552 1.08565 1.08436 1.08464
1.08463 1.08543 1.08452 1.08475
1.08475 1.08504 1.08427 1.08436
1.08433 1.08438 1.08275 1.08285
1.08275 1.08353 1.08275 1.08325
1.08325 1.08431 1.08315 1.08378
1.08379 1.08383 1.08275 1.08294
1.08292 1.08338 1.08271 1.08325
What I want to do, is creating a new column a$mean
storing the mean of a$high
and a$low
for each row.
Here is how I achieved that:
highlowmean <- function(highs, lows){
m <- vector(mode="numeric", length=0)
for (i in 1:length(highs)){
m[i] <- mean(highs[i], lows[i])
}
return(m)
}
a$mean <- highlowmean(a$high, a$low)
However I'm a bit new into R and in functionnal languages in general, so I'm pretty sure that there is a more efficient/simple way to achieve that.
How to achieve that the smartest way?
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
Method 1: Using colMeans() function For this simply pass the dataframe in use to the colMeans() function. The result will be the mean of all the individual columns. Example: R.
We can use rowMeans
a$mean <- rowMeans(a[,c('high', 'low')], na.rm=TRUE)
NOTE: If there are NA values, it is better to use rowMeans
For example
a <- data.frame(High= c(NA, 3, 2), low= c(3, NA, 0))
rowMeans(a, na.rm=TRUE)
#[1] 3 3 1
and using +
a1 <- replace(a, is.na(a), 0)
(a1[1] + a1[2])/2
# High
#1 1.5
#2 1.5
#3 1.0
NOTE: This is no way trying to tarnish the other answer. It works in most cases and is fast as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With