I've just started with R and I've executed these statements: <pre class="prettyprint"><code>library(datasets) head(airquality) s <- split(airquality,airquality$Month) sapply(s, function(x) {colMeans(x[,c("Ozone", "Solar.R", "Wind")], na.rm = TRUE)}) lapply(s, function(x) {colMeans(na.omit(x[,c("Ozone", "Solar.R", "Wind")])) }) </code></pre> For the <code>sapply</code>, it returns the following: <pre class="prettyprint"><code> 5 6 7 8 9 Ozone 23.61538 29.44444 59.115385 59.961538 31.44828 Solar.R 181.29630 190.16667 216.483871 171.857143 167.43333 Wind 11.62258 10.26667 8.941935 8.793548 10.18000 </code></pre> And for <code>lapply</code>, it returns the following: <pre class="prettyprint"><code>$`5` Ozone Solar.R Wind 24.12500 182.04167 11.50417 $`6` Ozone Solar.R Wind 29.44444 184.22222 12.17778 $`7` Ozone Solar.R Wind 59.115385 216.423077 8.523077 $`8` Ozone Solar.R Wind 60.00000 173.08696 8.86087 $`9` Ozone Solar.R Wind 31.44828 168.20690 10.07586 </code></pre> Now, my question would be, why are the returned values similar, but not the same? Isn't <code>na.rm = TRUE</code> and <code>na.omit</code> supposed to be doing the exact same thing? Omit the missing values and calculate the mean only for the values that we have? And in that case, shouldn't I have had the same values in both result sets? Thank you so much for any input!

They are not supposed to give the same result. Consider this example: <pre class="prettyprint"><code>exdf<-data.frame(a=c(1,NA,5),b=c(3,2,2)) # a b #1 1 3 #2 NA 2 #3 5 2 colMeans(exdf,na.rm=TRUE) # a b #3.000000 2.333333 colMeans(na.omit(exdf)) # a b #3.0 2.5 </code></pre> Why is this? In the first case, the mean of column <code>b</code> is calculated through <code>(3+2+2)/3</code>. In the second case, the second row is removed in its entirety (also the value of <code>b</code> which is not-NA and therefore considered in the first case) by <code>na.omit</code> and so the <code>b</code> mean is just <code>(3+2)/2</code>.

The difference of na.rm and na.omit in R

Tags:

r

na

I've just started with R and I've executed these statements:

library(datasets)
head(airquality)
s <- split(airquality,airquality$Month)
sapply(s, function(x) {colMeans(x[,c("Ozone", "Solar.R", "Wind")], na.rm = TRUE)})
lapply(s, function(x) {colMeans(na.omit(x[,c("Ozone", "Solar.R", "Wind")])) })

For the sapply, it returns the following:

             5         6          7          8         9
Ozone    23.61538  29.44444  59.115385  59.961538  31.44828
Solar.R 181.29630 190.16667 216.483871 171.857143 167.43333
Wind     11.62258  10.26667   8.941935   8.793548  10.18000

And for lapply, it returns the following:

$`5`
    Ozone   Solar.R      Wind 
 24.12500 182.04167  11.50417 

$`6`
    Ozone   Solar.R      Wind 
 29.44444 184.22222  12.17778 

$`7`
     Ozone    Solar.R       Wind 
 59.115385 216.423077   8.523077 

$`8`
    Ozone   Solar.R      Wind 
 60.00000 173.08696   8.86087 

$`9`
    Ozone   Solar.R      Wind 
 31.44828 168.20690  10.07586

Now, my question would be, why are the returned values similar, but not the same? Isn't na.rm = TRUE and na.omit supposed to be doing the exact same thing? Omit the missing values and calculate the mean only for the values that we have? And in that case, shouldn't I have had the same values in both result sets?

Thank you so much for any input!

265

asked Jan 11 '17 10:01

raluca

1 Answers

They are not supposed to give the same result. Consider this example:

exdf<-data.frame(a=c(1,NA,5),b=c(3,2,2))
#   a b
#1  1 3
#2 NA 2
#3  5 2
colMeans(exdf,na.rm=TRUE)
#       a        b 
#3.000000 2.333333
colMeans(na.omit(exdf))
#  a   b 
#3.0 2.5

Why is this? In the first case, the mean of column b is calculated through (3+2+2)/3. In the second case, the second row is removed in its entirety (also the value of b which is not-NA and therefore considered in the first case) by na.omit and so the b mean is just (3+2)/2.

answered Sep 23 '22 18:09

nicola

Related questions
                            
                                How to compare two matrices to see if they are identical in R?
                            
                                Returning a data-frame from C to R -
                            
                                R encoding unable to save symbol
                            
                                Grouped correlation with dplyr (works only on console)
                            
                                Adding elements to a list in for loop in R
                            
                                write.xlsx error in Error in .jnew and j.check in R
                            
                                Histogram conditional fill color
                            
                                find neighbouring elements of a matrix in R
                            
                                R tm removeWords function not removing words
                            
                                Passing Parameters to R Markdown
                            
                                ggplot2 scale x date?
                            
                                Trying to randomise a game of rock, paper, scissors in R
                            
                                Get Quantile values from geom_boxplot()
                            
                                Remove any digit only in first N characters
                            
                                xtable in R: Cannot get rid of row numbers [duplicate]
                            
                                Buttons: download button with scroller downloads only few rows
                            
                                Add an average line to an existing plot
                            
                                Reading a file on a network in R
                            
                                Second Y-Axis in a R plotly graph
                            
                                How to use R ggplot stat_summary to plot median and quartiles?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With