Problem with data frame transformation using dplyr package

Question

Problem

Let's consider two data frames :

One containing only 1's and 0's and second one with data :

set.seed(20)
df<-data.frame(sample(0:1,5,T),sample(0:1,5,T),sample(0:1,5,T))

#zero_one data frame
  sample.0.1..5..T. sample.0.1..5..T..1 sample.0.1..5..T..2
1                 0                   1                   0
2                 1                   0                   0
3                 1                   1                   1
4                 0                   0                   0
5                 1                   0                   1
df1<-data.frame(append(rnorm(4),10),append(runif(4),-5),append(rexp(4),20))

#with data
  append.rnorm.4...10. append.runif.4....5. append.rexp.4...20.
1           0.08609139            0.2374272           0.3341095
2          -0.63778176            0.2297862           0.7537732
3           0.22642990            0.9447793           1.3011998
4          -0.05418293            0.8448115           1.2097271
5          10.00000000           -5.0000000          20.0000000

Now what I want to do is to change values in second data frame for which first data frame takes values 0 by mean calculated for values for which first data frame takes value one.

Example

In first column I want to replace 0.08609139 and -0.05418293 (values for which first column in first data frame takes values 0) by mean(-0.63778176, 0.22642990,10.00000000) (values for which first column in first data frame takes values 1).

I want to do it using mutate_all() function from dplyr package.

My work so far

  df1<-df1 %>% mutate_all(
      function(x) ifelse(df[x]==0, mean(x[df==1],na.rm=T,x)))

I know that the condition df[x] is meaningless, but I have no idea what should i put there. Could you please help me with that ?

Ben · Accepted Answer

You could follow @deschen's suggestion and multiply the two data frames together.

Here is another approach to consider using mapply. For each column, identify the positions (indices) in df where value is zero.

Then, substitute the corresponding df1 column of those positions with the mean of other values in the column. y[-idx] should be all values in the df1 column that exclude those positions.

Note that my set.seed is different - when I used yours of 20 I got different values, and a column with all zeroes. Please let me know if you are able to reproduce.

set.seed(12)

df<-data.frame(sample(0:1,5,T),sample(0:1,5,T),sample(0:1,5,T))
df1<-data.frame(append(rnorm(4),10),append(runif(4),-5),append(rexp(4),20))

my_fun <- function(x, y) {
  idx <- which(x == 0)
  y[idx] <- mean(y[-idx])
  return(y)
}

mapply(my_fun, df, df1)

Problem with data frame transformation using dplyr package

Tags:

dataframe

r

dplyr

John

1 Answers

Ben

Recent Activity

Donate For Us

Problem with data frame transformation using dplyr package

Tags:

dataframe

r

dplyr

John

1 Answers

Ben

Related questions

Recent Activity

Donate For Us