I am trying to calculate descriptive statistics for the birthweight data set (<code>birthwt</code>) found in RStudio. However, I'm only interested in a few variables: <code>age</code>, <code>ftv</code>, <code>ptl</code> and <code>lwt</code>. This is the code I have so far: <pre class="prettyprint"><code>library(MASS) library(dplyr) data("birthwt") grouped <- group_by(birthwt, age, ftv, ptl, lwt) summarise(grouped, mean = mean(bwt), median = median(bwt), SD = sd(bwt)) </code></pre> It gives me a pretty-printed table but only a limited number of the SD is filled and the rest say <code>NA</code>. I just can't work out why or how to fix it!

I stumbled here for another reason and also for me, the answer comes from the docs: <pre class="prettyprint"><code># BEWARE: reusing variables may lead to unexpected results mtcars %>% group_by(cyl) %>% summarise(disp = mean(disp), sd = sd(disp)) #> `summarise()` ungrouping output (override with `.groups` argument) #> # A tibble: 3 x 3 #> cyl disp sd #> <dbl> <dbl> <dbl> #> 1 4 105. NA #> 2 6 183. NA #> 3 8 353. NA </code></pre> So, in case someone has the same reason as me, instead of reusing a variable, create new ones: <pre class="prettyprint"><code>mtcars %>% group_by(cyl) %>% summarise( disp_mean = mean(disp), disp_sd = sd(disp) ) `summarise()` ungrouping output (override with `.groups` argument) # A tibble: 3 x 3 cyl disp_mean disp_sd <dbl> <dbl> <dbl> 1 4 105. 26.9 2 6 183. 41.6 3 8 353. 67.8 </code></pre>

Standard Deviation coming up NA when using summarise() function

Tags:

r

dplyr

standard-deviation

I am trying to calculate descriptive statistics for the birthweight data set (birthwt) found in RStudio. However, I'm only interested in a few variables: age, ftv, ptl and lwt.

This is the code I have so far:

library(MASS)
library(dplyr)
data("birthwt")

grouped <- group_by(birthwt, age, ftv, ptl, lwt)

summarise(grouped, 
          mean = mean(bwt),
          median = median(bwt),
          SD = sd(bwt))

It gives me a pretty-printed table but only a limited number of the SD is filled and the rest say NA. I just can't work out why or how to fix it!

692

asked Jan 04 '18 03:01

Angus

2 Answers

I stumbled here for another reason and also for me, the answer comes from the docs:

# BEWARE: reusing variables may lead to unexpected results
mtcars %>%
    group_by(cyl) %>%
    summarise(disp = mean(disp), sd = sd(disp))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#>     cyl  disp    sd
#>   <dbl> <dbl> <dbl>
#> 1     4  105.    NA
#> 2     6  183.    NA
#> 3     8  353.    NA

So, in case someone has the same reason as me, instead of reusing a variable, create new ones:

mtcars %>%
group_by(cyl) %>%
summarise(
    disp_mean = mean(disp),
    disp_sd = sd(disp)
)

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
    cyl disp_mean disp_sd
  <dbl>     <dbl>   <dbl>
1     4      105.    26.9
2     6      183.    41.6
3     8      353.    67.8

169

answered Nov 04 '22 01:11

teppo

The number of rows for some of the groups are 1.

grouped %>% 
     summarise(n = n())
# A tibble: 179 x 5
# Groups: age, ftv, ptl [?]
#     age   ftv   ptl   lwt     n
#   <int> <int> <int> <int> <int>
# 1    14     0     0   135     1
# 2    14     0     1   101     1
# 3    14     2     0   100     1
# 4    15     0     0    98     1
# 5    15     0     0   110     1
# 6    15     0     0   115     1
# 7    16     0     0   110     1
# 8    16     0     0   112     1
# 9    16     0     0   135     2
#10    16     1     0    95     1

According to ?sd,

The standard deviation of a length-one vector is NA.

This results in NA values for the sd where there is only one element

answered Nov 04 '22 00:11

akrun

Related questions
                            
                                Use gsub to replace curly apostrophe with straight apostrophe in R list of character vectors
                            
                                Divide all elements in row with the max value in row - Faster approach
                            
                                re-ordering factors according to a value using fct_reorder in R
                            
                                Grouped mean of difftime fails in data.table
                            
                                R: Sort a string of items alphabetically [duplicate]
                            
                                Improve efficiency for removing duplicate values per row and shift values in R
                            
                                Julia function with NULL argument
                            
                                Exporting a caret R model with minimum information used to predict
                            
                                How to merge git branches in RStudio
                            
                                Selecting columns based on row values in multiple columns using dplyr
                            
                                Understanding functions with variable scoping
                            
                                Use animate() with series of levelplots in R raster
                            
                                How to set up an independent progress bar
                            
                                Extract all model statistics from rms fits?
                            
                                Method to operate on each row of data.table without using apply function
                            
                                How to write a ggplot '+'-pipeable function that can refer to the input plot
                            
                                R Shiny: include image in modalDialog
                            
                                How to specify the non-linear interaction of two factor variables in generalised additive models [R]
                            
                                Encoding JSON in r
                            
                                How to plot ticks and numbers of y and x axes in a ggplot graph?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With