Apply a function to all rows except the current one (dplyr)

Question

I'm learning the dplyr R package and really liking it so far. One thing I need to do is create a new column whose value for each row is the result of applying a function to all rows except the current one (possibly subsetted by group), but I can't figure out a neat way of doing it.

A contrived example would be

library(datasets)
library(dplyr)

data(mtcars)
x <- mtcars %>% mutate(name=rownames(mtcars)) %>% filter(cyl==4) %>% select(name,cyl,mpg)

# This is what I want to do more elegantly
x$othermpg <- NA
for (i in 1:nrow(x))
    x$othermpg[i] <- mean(x$mpg[-i])

Here, the othermpg column gets the mean of the mpg values for all cars except the one in the current row.

Can anyone help, please?

akrun · Accepted Answer

This could be done in data.table

 library(data.table)
 setDT(x)[, N:= 1:.N][, othermpg2:=mean(x[N!= .BY, mpg]), by=N][,N:=NULL]
  x
  #             name cyl  mpg othermpg othermpg2
  #1:     Datsun 710   4 22.8    27.05     27.05
  #2:      Merc 240D   4 24.4    26.89     26.89
  #3:       Merc 230   4 22.8    27.05     27.05
  #4:       Fiat 128   4 32.4    26.09     26.09
  #5:    Honda Civic   4 30.4    26.29     26.29
  #6: Toyota Corolla   4 33.9    25.94     25.94
  #7:  Toyota Corona   4 21.5    27.18     27.18
  #8:      Fiat X1-9   4 27.3    26.60     26.60
  #9:  Porsche 914-2   4 26.0    26.73     26.73
 #10:   Lotus Europa   4 30.4    26.29     26.29
 #11:     Volvo 142E   4 21.4    27.19     27.19

The idea was to first create a row index/column N:=1:.N
Group based on that index by=N]
x[N!= .BY, mpg] gives the rows of mpg that are not equal to the grouping variable
do the mean
N:=NULL As N is not needed, that column is dropped.

Or you could try (Inspired from @thelatemail's answer)

 setDT(x)[, N:=1:.N]
 setkey(x, N)
 x[,othermpg2 := mean(x[!.BY, mpg]), by=N][,N:=NULL]

Or without creating N (from @Jon Clayden's comments)

 setDT(x)[, othermpg2:=mean(x[name!=.BY,mpg]), by=name]

Using dplyr, this seems to work

x %>% 
mutate(N=1:n()) %>% 
do( data.frame(.,othermpg2=sapply(.$N, function(i) mean(.$mpg[!.$N %in% i]))))
  #             name cyl  mpg othermpg  N othermpg2
  #1      Datsun 710   4 22.8    27.05  1     27.05
  #2       Merc 240D   4 24.4    26.89  2     26.89
  #3        Merc 230   4 22.8    27.05  3     27.05
  #4        Fiat 128   4 32.4    26.09  4     26.09
  #5     Honda Civic   4 30.4    26.29  5     26.29
  #6  Toyota Corolla   4 33.9    25.94  6     25.94
  #7   Toyota Corona   4 21.5    27.18  7     27.18
  #8       Fiat X1-9   4 27.3    26.60  8     26.60
  #9   Porsche 914-2   4 26.0    26.73  9     26.73
  #10   Lotus Europa   4 30.4    26.29 10     26.29
  #11     Volvo 142E   4 21.4    27.19 11     27.19

talat · Answer

For the sample case, you can use the following code to avoid a for loop. Basically, instead of excluding the "current" row, you simply subtract it from the total:

library(dplyr)

x %>% mutate(othermpg2 = (sum(mpg)-mpg) / (length(mpg) -1 ))

#             name cyl  mpg othermpg othermpg2
#1      Datsun 710   4 22.8    27.05     27.05
#2       Merc 240D   4 24.4    26.89     26.89
#3        Merc 230   4 22.8    27.05     27.05
#4        Fiat 128   4 32.4    26.09     26.09
#5     Honda Civic   4 30.4    26.29     26.29
#6  Toyota Corolla   4 33.9    25.94     25.94
#7   Toyota Corona   4 21.5    27.18     27.18
#8       Fiat X1-9   4 27.3    26.60     26.60
#9   Porsche 914-2   4 26.0    26.73     26.73
#10   Lotus Europa   4 30.4    26.29     26.29
#11     Volvo 142E   4 21.4    27.19     27.19

In case you feel like this can't be applied to your real data, please edit your question with a more representative example.

Apply a function to all rows except the current one (dplyr)

Tags:

r

dplyr

Jon Clayden

2 Answers

akrun

talat

Recent Activity

Donate For Us

Apply a function to all rows except the current one (dplyr)

Tags:

r

dplyr

Jon Clayden

2 Answers

akrun

talat

Related questions

Recent Activity

Donate For Us