I am trying to replace some missing values in my data with the average values from a similar group. My data looks like this: <pre class="prettyprint"><code> X Y 1 x y 2 x y 3 NA y 4 x y </code></pre> And I want it to look like this: <pre class="prettyprint"><code> X Y 1 x y 2 x y 3 y y 4 x y </code></pre> I wrote this, and it worked <pre class="prettyprint"><code>for(i in 1:nrow(data.frame){ if( is.na(data.frame$X[i]) == TRUE){ data.frame$X[i] <- data.frame$Y[i] } } </code></pre> But my data.frame is almost half a million lines long, and the for/if statements are pretty slow. What I want is something like <pre class="prettyprint"><code>is.na(data.frame$X) <- data.frame$Y </code></pre> But this gets a mismatched size error. It seems like there should be a command that does this, but I cannot find it here on SO or on the R help list. Any ideas?

<code>ifelse</code> is your friend. Using Dirk's dataset <pre class="prettyprint"><code>df <- within(df, X <- ifelse(is.na(X), Y, X)) </code></pre>

R: replace NA with item from vector

Tags:

replace

r

missing-data

imputation

I am trying to replace some missing values in my data with the average values from a similar group.

My data looks like this:

   X   Y
1  x   y
2  x   y
3  NA  y
4  x   y

And I want it to look like this:

  X   Y
1  x   y
2  x   y
3  y   y
4  x   y

I wrote this, and it worked

for(i in 1:nrow(data.frame){
   if( is.na(data.frame$X[i]) == TRUE){
       data.frame$X[i] <- data.frame$Y[i]
   }
  }

But my data.frame is almost half a million lines long, and the for/if statements are pretty slow. What I want is something like

is.na(data.frame$X) <- data.frame$Y

But this gets a mismatched size error. It seems like there should be a command that does this, but I cannot find it here on SO or on the R help list. Any ideas?

645

asked Jul 13 '11 19:07

gregmacfarlane

1 Answers

ifelse is your friend.

Using Dirk's dataset

df <- within(df, X <- ifelse(is.na(X), Y, X))

198

answered Sep 28 '22 04:09

Richie Cotton

Related questions
                            
                                Adding value after every nth element of vector in R
                            
                                Group by one column, select row with minimum in one column for every pair of columns
                            
                                Why is bam from mgcv slow for some data?
                            
                                Decrease margins between plots when using cowplot
                            
                                Installing R on Linux: configure: WARNING: you cannot build PDF versions of the R manuals
                            
                                How to correctly convert NaN to NA
                            
                                using tidyr unnest with NULL values
                            
                                Find column number that satisfies condition
                            
                                curl package not available for several R packages
                            
                                Change legend title ggplot2 [duplicate]
                            
                                How to subset dataframe based on a "not equal to" criteria applied to a large number of columns?
                            
                                How to know the operations made to calculate the Levenshtein distance between strings?
                            
                                Creating a new column conditionally based on previous n rows
                            
                                Any faster way to check if lists in a list are equivalent?
                            
                                Force stop or halt on error
                            
                                Plotting functions on top of datapoints in R
                            
                                vector of variable names in R
                            
                                Turning RData file into script files
                            
                                What are S1 and S2 classes?
                            
                                How to label graph with the mean of the values using ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With