I'm learning the dplyr
R package and really liking it so far. One thing I need to do is create a new column whose value for each row is the result of applying a function to all rows except the current one (possibly subsetted by group), but I can't figure out a neat way of doing it.
A contrived example would be
library(datasets)
library(dplyr)
data(mtcars)
x <- mtcars %>% mutate(name=rownames(mtcars)) %>% filter(cyl==4) %>% select(name,cyl,mpg)
# This is what I want to do more elegantly
x$othermpg <- NA
for (i in 1:nrow(x))
x$othermpg[i] <- mean(x$mpg[-i])
Here, the othermpg
column gets the mean of the mpg
values for all cars except the one in the current row.
Can anyone help, please?
This could be done in data.table
library(data.table)
setDT(x)[, N:= 1:.N][, othermpg2:=mean(x[N!= .BY, mpg]), by=N][,N:=NULL]
x
# name cyl mpg othermpg othermpg2
#1: Datsun 710 4 22.8 27.05 27.05
#2: Merc 240D 4 24.4 26.89 26.89
#3: Merc 230 4 22.8 27.05 27.05
#4: Fiat 128 4 32.4 26.09 26.09
#5: Honda Civic 4 30.4 26.29 26.29
#6: Toyota Corolla 4 33.9 25.94 25.94
#7: Toyota Corona 4 21.5 27.18 27.18
#8: Fiat X1-9 4 27.3 26.60 26.60
#9: Porsche 914-2 4 26.0 26.73 26.73
#10: Lotus Europa 4 30.4 26.29 26.29
#11: Volvo 142E 4 21.4 27.19 27.19
N:=1:.N
by=N]
x[N!= .BY, mpg]
gives the rows of mpg
that are not equal to the grouping variableN:=NULL
As N is not needed, that column is dropped.Or you could try (Inspired from @thelatemail's answer)
setDT(x)[, N:=1:.N]
setkey(x, N)
x[,othermpg2 := mean(x[!.BY, mpg]), by=N][,N:=NULL]
Or without creating N
(from @Jon Clayden's comments)
setDT(x)[, othermpg2:=mean(x[name!=.BY,mpg]), by=name]
Using dplyr
, this seems to work
x %>%
mutate(N=1:n()) %>%
do( data.frame(.,othermpg2=sapply(.$N, function(i) mean(.$mpg[!.$N %in% i]))))
# name cyl mpg othermpg N othermpg2
#1 Datsun 710 4 22.8 27.05 1 27.05
#2 Merc 240D 4 24.4 26.89 2 26.89
#3 Merc 230 4 22.8 27.05 3 27.05
#4 Fiat 128 4 32.4 26.09 4 26.09
#5 Honda Civic 4 30.4 26.29 5 26.29
#6 Toyota Corolla 4 33.9 25.94 6 25.94
#7 Toyota Corona 4 21.5 27.18 7 27.18
#8 Fiat X1-9 4 27.3 26.60 8 26.60
#9 Porsche 914-2 4 26.0 26.73 9 26.73
#10 Lotus Europa 4 30.4 26.29 10 26.29
#11 Volvo 142E 4 21.4 27.19 11 27.19
For the sample case, you can use the following code to avoid a for loop. Basically, instead of excluding the "current" row, you simply subtract it from the total:
library(dplyr)
x %>% mutate(othermpg2 = (sum(mpg)-mpg) / (length(mpg) -1 ))
# name cyl mpg othermpg othermpg2
#1 Datsun 710 4 22.8 27.05 27.05
#2 Merc 240D 4 24.4 26.89 26.89
#3 Merc 230 4 22.8 27.05 27.05
#4 Fiat 128 4 32.4 26.09 26.09
#5 Honda Civic 4 30.4 26.29 26.29
#6 Toyota Corolla 4 33.9 25.94 25.94
#7 Toyota Corona 4 21.5 27.18 27.18
#8 Fiat X1-9 4 27.3 26.60 26.60
#9 Porsche 914-2 4 26.0 26.73 26.73
#10 Lotus Europa 4 30.4 26.29 26.29
#11 Volvo 142E 4 21.4 27.19 27.19
In case you feel like this can't be applied to your real data, please edit your question with a more representative example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With