I am trying to calculate descriptive statistics for the birthweight data set (birthwt
) found in RStudio. However, I'm only interested in a few variables: age
, ftv
, ptl
and lwt
.
This is the code I have so far:
library(MASS)
library(dplyr)
data("birthwt")
grouped <- group_by(birthwt, age, ftv, ptl, lwt)
summarise(grouped,
mean = mean(bwt),
median = median(bwt),
SD = sd(bwt))
It gives me a pretty-printed table but only a limited number of the SD is filled and the rest say NA
. I just can't work out why or how to fix it!
Method 1: Using summarise_all() method The summarise_all method in R is used to affect every column of the data frame. The output data frame returns all the columns of the data frame where the specified function is applied over every column.
Summarize Function in R Programming. As its name implies, the summarize function reduces a data frame to a summary of just one vector or value. Many times, these summaries are calculated by grouping observations using a factor or categorical variables first.
The function n() returns the number of observations in a current group.
I stumbled here for another reason and also for me, the answer comes from the docs:
# BEWARE: reusing variables may lead to unexpected results
mtcars %>%
group_by(cyl) %>%
summarise(disp = mean(disp), sd = sd(disp))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#> cyl disp sd
#> <dbl> <dbl> <dbl>
#> 1 4 105. NA
#> 2 6 183. NA
#> 3 8 353. NA
So, in case someone has the same reason as me, instead of reusing a variable, create new ones:
mtcars %>%
group_by(cyl) %>%
summarise(
disp_mean = mean(disp),
disp_sd = sd(disp)
)
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
cyl disp_mean disp_sd
<dbl> <dbl> <dbl>
1 4 105. 26.9
2 6 183. 41.6
3 8 353. 67.8
The number of rows for some of the groups are 1.
grouped %>%
summarise(n = n())
# A tibble: 179 x 5
# Groups: age, ftv, ptl [?]
# age ftv ptl lwt n
# <int> <int> <int> <int> <int>
# 1 14 0 0 135 1
# 2 14 0 1 101 1
# 3 14 2 0 100 1
# 4 15 0 0 98 1
# 5 15 0 0 110 1
# 6 15 0 0 115 1
# 7 16 0 0 110 1
# 8 16 0 0 112 1
# 9 16 0 0 135 2
#10 16 1 0 95 1
According to ?sd
,
The standard deviation of a length-one vector is NA.
This results in NA
values for the sd
where there is only one element
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With