I already reviewed the following two posts and think they might answer my question, although I'm struggling to see how: 1) Conditional replacement of values in a data.frame 2) Creating a function to replace NAs from one data.frame with values from another With that said, I'm trying to replace NAs in one data frame by referencing another data frame of a different (shorter) length and pulling in replacement values from column "B" where the values for column "A" in each data frame match. I've modified the data, below, for simplicity and illustration, although the concept is the same in the actual data. FYI, in the real second data frame, there are also no duplicates in column "A". Here's the first data frame (df1): <pre class="prettyprint"><code>> df1 B C A 1 NA 2012-10-01 0 2 NA 2012-10-01 5 3 4 2012-10-01 10 4 NA 2012-10-01 15 5 NA 2012-10-01 20 6 20 2012-10-01 25 7 NA 2012-10-01 0 8 NA 2012-10-01 5 9 5 2012-10-01 10 10 5 2012-10-01 15 > str(df1) 'data.frame': 10 obs. of 3 variables: $ B: num NA NA 4 NA NA 20 NA NA 5 5 $ C: Factor w/ 1 level "2012-10-01": 1 1 1 1 1 1 1 1 1 1 $ A: num 0 5 10 15 20 25 0 5 10 15 </code></pre> And the second data frame (df2). <pre class="prettyprint"><code>> df2 A B 1 0 1.7169811 2 5 0.3396226 3 10 0.1320755 4 15 0.1509434 5 20 0.0754717 6 25 2.0943396 > str(df2) 'data.frame': 6 obs. of 2 variables: $ A: int 0 5 10 15 20 25 $ B: num 1.717 0.3396 0.1321 0.1509 0.0755 ... </code></pre> I think I'm pretty close with the following code: <pre class="prettyprint"><code>> ifelse(is.na(df1$B) == TRUE, df2$B[df2$A == df1$A], df1$B) [1] 1.7169811 0.3396226 4.0000000 0.1509434 0.0754717 20.0000000 NA NA [9] 5.0000000 5.0000000 Warning message: In df2$A == df1$A : longer object length is not a multiple of shorter object length </code></pre> Obviously, I want the 7th and 8th output elements to be 1.7169811 and 0.3396226, rather than NAs . . . Thanks, in advance, for any help, and, once again, thanks for your patience!

Try the following code which takes your original statement and makes a small tweak in the <code>TRUE</code> argument of the <code>ifelse</code> function: <pre class="prettyprint"><code>> df1$B <- ifelse(is.na(df1$B) == TRUE, df2$B[df2$A %in% df1$A], df1$B) # Switched '==' to '%in%' ---^ > df1 B C A 1 1.7169811 2012-10-01 0 2 0.3396226 2012-10-01 5 3 4.0000000 2012-10-01 10 4 0.1509434 2012-10-01 15 5 0.0754717 2012-10-01 20 6 20.0000000 2012-10-01 25 7 1.7169811 2012-10-01 0 8 0.3396226 2012-10-01 5 9 5.0000000 2012-10-01 10 10 5.0000000 2012-10-01 15 </code></pre>

Using ifelse() to replace NAs in one data frame by referencing another data frame of different length

Tags:

dataframe

r

if-statement

na

I already reviewed the following two posts and think they might answer my question, although I'm struggling to see how:

1) Conditional replacement of values in a data.frame 2) Creating a function to replace NAs from one data.frame with values from another

With that said, I'm trying to replace NAs in one data frame by referencing another data frame of a different (shorter) length and pulling in replacement values from column "B" where the values for column "A" in each data frame match.

I've modified the data, below, for simplicity and illustration, although the concept is the same in the actual data. FYI, in the real second data frame, there are also no duplicates in column "A".

Here's the first data frame (df1):

> df1
    B          C  A
1  NA 2012-10-01  0
2  NA 2012-10-01  5
3   4 2012-10-01 10
4  NA 2012-10-01 15
5  NA 2012-10-01 20
6  20 2012-10-01 25
7  NA 2012-10-01  0
8  NA 2012-10-01  5
9   5 2012-10-01 10
10  5 2012-10-01 15

> str(df1)
'data.frame':   10 obs. of  3 variables:
 $ B: num  NA NA 4 NA NA 20 NA NA 5 5
 $ C: Factor w/ 1 level "2012-10-01": 1 1 1 1 1 1 1 1 1 1
 $ A: num  0 5 10 15 20 25 0 5 10 15

And the second data frame (df2).

> df2
   A         B
1  0 1.7169811
2  5 0.3396226
3 10 0.1320755
4 15 0.1509434
5 20 0.0754717
6 25 2.0943396

> str(df2)
'data.frame':   6 obs. of  2 variables:
 $ A: int  0 5 10 15 20 25
 $ B: num  1.717 0.3396 0.1321 0.1509 0.0755 ...

I think I'm pretty close with the following code:

> ifelse(is.na(df1$B) == TRUE, df2$B[df2$A == df1$A], df1$B)
 [1]  1.7169811  0.3396226  4.0000000  0.1509434  0.0754717 20.0000000         NA         NA
 [9]  5.0000000  5.0000000
Warning message:
In df2$A == df1$A :
  longer object length is not a multiple of shorter object length

Obviously, I want the 7th and 8th output elements to be 1.7169811 and 0.3396226, rather than NAs . . .

Thanks, in advance, for any help, and, once again, thanks for your patience!

393

asked Jul 20 '14 04:07

Daniel Fletcher

2 Answers

Try the following code which takes your original statement and makes a small tweak in the TRUE argument of the ifelse function:

> df1$B <- ifelse(is.na(df1$B) == TRUE, df2$B[df2$A %in% df1$A], df1$B)   
#                         Switched '==' to '%in%' ---^
> df1
            B          C  A
1   1.7169811 2012-10-01  0
2   0.3396226 2012-10-01  5
3   4.0000000 2012-10-01 10
4   0.1509434 2012-10-01 15
5   0.0754717 2012-10-01 20
6  20.0000000 2012-10-01 25
7   1.7169811 2012-10-01  0
8   0.3396226 2012-10-01  5
9   5.0000000 2012-10-01 10
10  5.0000000 2012-10-01 15

answered Oct 03 '22 22:10

ccapizzano

You may also use:

df1$B[is.na(df1$B)] <- df2$B[match(df1$A[is.na(df1$B)],df2$A)]
df1

#             B          C  A
# 1   1.7169811 2012-10-01  0
# 2   0.3396226 2012-10-01  5
# 3   4.0000000 2012-10-01 10
# 4   0.1509434 2012-10-01 15
# 5   0.0754717 2012-10-01 20
# 6  20.0000000 2012-10-01 25
# 7   1.7169811 2012-10-01  0
# 8   0.3396226 2012-10-01  5
# 9   5.0000000 2012-10-01 10
# 10  5.0000000 2012-10-01 15

answered Oct 04 '22 00:10

akrun

Related questions
                            
                                create templates using ggplot2 syntax?
                            
                                Use of offset in lm regression - R
                            
                                Connect R and Vertica using RODBC
                            
                                Remove consecutive duplicate entries
                            
                                Make a boxplot without whiskers
                            
                                What is the fastest way to obtain frequencies of integers in a vector?
                            
                                Storing results of loop iterations in R
                            
                                converting numbers to time
                            
                                R ave by columns
                            
                                glmer - predict with binomial data (cbind count data)
                            
                                Import text file using ff package
                            
                                How to produce a meaningful draftsman/correlation plot for discrete values
                            
                                element as the list names and list name as the element in a list?
                            
                                Imputing missing values linearly in R
                            
                                How can I find the index of all NA in a dataframe column?
                            
                                R 3.0.3 rbind multiple csv files
                            
                                Why do variable lookups in the body of function A take values from the global environment but not function B that calls A?
                            
                                R : function to generate a mixture distribution
                            
                                How do I split a data frame based on range of column values in R?
                            
                                How to compute residuals of a point process in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With