Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ifelse() to replace NAs in one data frame by referencing another data frame of different length

I already reviewed the following two posts and think they might answer my question, although I'm struggling to see how:

1) Conditional replacement of values in a data.frame 2) Creating a function to replace NAs from one data.frame with values from another

With that said, I'm trying to replace NAs in one data frame by referencing another data frame of a different (shorter) length and pulling in replacement values from column "B" where the values for column "A" in each data frame match.

I've modified the data, below, for simplicity and illustration, although the concept is the same in the actual data. FYI, in the real second data frame, there are also no duplicates in column "A".

Here's the first data frame (df1):

> df1
    B          C  A
1  NA 2012-10-01  0
2  NA 2012-10-01  5
3   4 2012-10-01 10
4  NA 2012-10-01 15
5  NA 2012-10-01 20
6  20 2012-10-01 25
7  NA 2012-10-01  0
8  NA 2012-10-01  5
9   5 2012-10-01 10
10  5 2012-10-01 15

> str(df1)
'data.frame':   10 obs. of  3 variables:
 $ B: num  NA NA 4 NA NA 20 NA NA 5 5
 $ C: Factor w/ 1 level "2012-10-01": 1 1 1 1 1 1 1 1 1 1
 $ A: num  0 5 10 15 20 25 0 5 10 15

And the second data frame (df2).

> df2
   A         B
1  0 1.7169811
2  5 0.3396226
3 10 0.1320755
4 15 0.1509434
5 20 0.0754717
6 25 2.0943396

> str(df2)
'data.frame':   6 obs. of  2 variables:
 $ A: int  0 5 10 15 20 25
 $ B: num  1.717 0.3396 0.1321 0.1509 0.0755 ...

I think I'm pretty close with the following code:

> ifelse(is.na(df1$B) == TRUE, df2$B[df2$A == df1$A], df1$B)
 [1]  1.7169811  0.3396226  4.0000000  0.1509434  0.0754717 20.0000000         NA         NA
 [9]  5.0000000  5.0000000
Warning message:
In df2$A == df1$A :
  longer object length is not a multiple of shorter object length

Obviously, I want the 7th and 8th output elements to be 1.7169811 and 0.3396226, rather than NAs . . .

Thanks, in advance, for any help, and, once again, thanks for your patience!

like image 393
Daniel Fletcher Avatar asked Jul 20 '14 04:07

Daniel Fletcher


People also ask

How to use ifelse across a range of R data frame columns?

ifelse across a range of R data frame columns If you have a location of necessary columns that should be transformed by using function ifelse, then it could be done with the function l apply.

How to replace Na in a Dataframe in R?

An alternative to the reassignment of the data frame cells having NA is to use the in-built R method to replace these values. is.na () method is used to evaluate whether the data element has a missing or NA value and then replace method is used to replace this value with a 0.

What is an ifelse statement in R?

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ifelse statements in R are the bread and butter of recoding variables. Normally these are pretty easy to do, particularly when we are recoding off one variable, and that variable contains no missing values.

Why does my data frame automatically select the necessary columns?

Maybe the necessary columns are changing position over time and you have to select necessary ones automatically. Here is my data frame. I would like to calculate the time difference in days by subtracting each of the column values that start with the prefix “last_” from the datetime in the second column and overwrite them with that results.


2 Answers

Try the following code which takes your original statement and makes a small tweak in the TRUE argument of the ifelse function:

> df1$B <- ifelse(is.na(df1$B) == TRUE, df2$B[df2$A %in% df1$A], df1$B)   
#                         Switched '==' to '%in%' ---^
> df1
            B          C  A
1   1.7169811 2012-10-01  0
2   0.3396226 2012-10-01  5
3   4.0000000 2012-10-01 10
4   0.1509434 2012-10-01 15
5   0.0754717 2012-10-01 20
6  20.0000000 2012-10-01 25
7   1.7169811 2012-10-01  0
8   0.3396226 2012-10-01  5
9   5.0000000 2012-10-01 10
10  5.0000000 2012-10-01 15
like image 93
ccapizzano Avatar answered Oct 03 '22 22:10

ccapizzano


You may also use:

df1$B[is.na(df1$B)] <- df2$B[match(df1$A[is.na(df1$B)],df2$A)]
df1

#             B          C  A
# 1   1.7169811 2012-10-01  0
# 2   0.3396226 2012-10-01  5
# 3   4.0000000 2012-10-01 10
# 4   0.1509434 2012-10-01 15
# 5   0.0754717 2012-10-01 20
# 6  20.0000000 2012-10-01 25
# 7   1.7169811 2012-10-01  0
# 8   0.3396226 2012-10-01  5
# 9   5.0000000 2012-10-01 10
# 10  5.0000000 2012-10-01 15
like image 33
akrun Avatar answered Oct 04 '22 00:10

akrun