I would like to use stargazer to produce summary statistics for each category of a grouping variable. I could do it in separate tables, but I'd like it all in one – if that is not unreasonably challenging for this package.
For example
library(stargazer)
stargazer(ToothGrowth, type = "text")
#>
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 60 18.813 7.649 4.200 33.900
#> dose 60 1.167 0.629 0.500 2.000
#> -----------------------------------------
provides summary statistics for the continuous variables in ToothGrowth
. I would like to split that summary by the categorical variable supp
, also in ToothGrowth
.
Two suggestions for desired outcome,
stargazer(ToothGrowth ~ supp, type = "text")
#>
#> ==================================================
#> Statistic N Mean St. Dev. Min Max
#> --------------------------------------------------
#> OJ len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> VC len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> --------------------------------------------------
#>
stargazer(ToothGrowth ~ supp, type = "text")
#>
#> ==================================================
#> Statistic N Mean St. Dev. Min Max
#> --------------------------------------------------
#> len
#> _by VC 30 16.963 8.266 4.200 33.900
#> _by VC 30 1.167 0.634 0.500 2.000
#> _tot 60 18.813 7.649 4.200 33.900
#>
#> dose
#> _by OJ 30 20.663 6.606 8.200 30.900
#> _by OJ 30 1.167 0.634 0.500 2.000
#> _tot 60 1.167 0.629 0.500 2.000
#> --------------------------------------------------
Proportions are often used to summarize categorical data and can be calculated by dividing individual frequencies by the total number of responses. In Python/pandas, df['column_name']. value_counts(normalize=True) will ignore missing data and divide the frequency of each category by the total in any category.
Descriptive statistics for one categorical variable Descriptive statistics used to analyse data for a single categorical variable include frequencies, percentages, fractions and/or relative frequencies (which are simply frequencies divided by the sample size) obtained from the variable's frequency distribution table.
The best way to summarize categorical data is to use frequencies and percentages (or proportions). A proportion is a fraction or part of the total that possesses a certain characteristic. The best way to summarize categorical data is to use frequencies and percentages like in the table.
The basic statistics available for categorical variables are counts and percentages. Number of cases in each cell of the table or number of responses for multiple response sets. If weighting is in effect, this value is the weighted count.
library(stargazer)
library(dplyr)
library(tidyr)
ToothGrowth %>%
group_by(supp) %>%
mutate(id = 1:n()) %>%
ungroup() %>%
gather(temp, val, len, dose) %>%
unite(temp1, supp, temp, sep = '_') %>%
spread(temp1, val) %>%
select(-id) %>%
as.data.frame() %>%
stargazer(type = 'text')
=========================================
Statistic N Mean St. Dev. Min Max
-----------------------------------------
OJ_dose 30 1.167 0.634 0.500 2.000
OJ_len 30 20.663 6.606 8.200 30.900
VC_dose 30 1.167 0.634 0.500 2.000
VC_len 30 16.963 8.266 4.200 33.900
-----------------------------------------
This gets rid of the problem mentioned by the OP in a comment to the original answer, "What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables." The easiest way I saw to do that with stargazer
was to create a new data frame that had variables for each group's observations using a gather()
, unite()
, spread()
strategy. The only trick to it is to avoid duplicate identifiers by creating unique identifiers by group and dropping that variable before calling stargazer()
.
Three possible solution. One using reporttools and xtable, one using tidyverse tools along with stargazer, and third a base-r solution.
I want to suggest you take a look at reporttools which is kinda leaving stargazer, but I think you should take a look at it,
# install.packages("reporttools") #Use this to install it, do this only once
require(reporttools)
vars <- ToothGrowth[,c('len','dose')]
group <- ToothGrowth[,c('supp')]
## display default statistics, only use a subset of observations, grouped analysis
tableContinuous(vars = vars, group = group, prec = 1, cap = "Table of 'len','dose' by 'supp' ", lab = "tab: descr stat")
% latex table generated in R 3.3.3 by xtable 1.8-2 package
\begingroup\footnotesize
\begin{longtable}{llrrrrrrrrrr}
\textbf{Variable} & \textbf{Levels} & $\mathbf{n}$ & \textbf{Min} & $\mathbf{q_1}$ & $\mathbf{\widetilde{x}}$ & $\mathbf{\bar{x}}$ & $\mathbf{q_3}$ & \textbf{Max} & $\mathbf{s}$ & \textbf{IQR} & \textbf{\#NA} \\
\hline
len & OJ & 30 & 8.2 & 15.5 & 22.7 & 20.7 & 25.7 & 30.9 & 6.6 & 10.2 & 0 \\
& VC & 30 & 4.2 & 11.2 & 16.5 & 17.0 & 23.1 & 33.9 & 8.3 & 11.9 & 0 \\
\hline
& all & 60 & 4.2 & 13.1 & 19.2 & 18.8 & 25.3 & 33.9 & 7.6 & 12.2 & 0 \\
\hline
dose & OJ & 30 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
& VC & 30 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
\hline
& all & 60 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
\hline
\hline
\caption{Table of 'len','dose' by 'supp' }
\label{tab: descr stat}
\end{longtable}
\endgroup
in latex you get this nice result,
using tidyverse tools along with stargazer, inspired by this SO answer,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(dplyr); library(purrr)
#> ToothGrowth %>% split(. $supp) %>% walk(~ stargazer(., type = "text"))
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#>
an exclusive base-r
by(ToothGrowth, ToothGrowth$supp, stargazer, type = 'text')
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#>
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#> ToothGrowth$supp: OJ
#> [1] ""
#> [2] "========================================="
#> [3] "Statistic N Mean St. Dev. Min Max "
#> [4] "-----------------------------------------"
#> [5] "len 30 20.663 6.606 8.200 30.900"
#> [6] "dose 30 1.167 0.634 0.500 2.000 "
#> [7] "-----------------------------------------"
#> ---------------------------------------------------------------
#> ToothGrowth$supp: VC
#> [1] ""
#> [2] "========================================="
#> [3] "Statistic N Mean St. Dev. Min Max "
#> [4] "-----------------------------------------"
#> [5] "len 30 16.963 8.266 4.200 33.900"
#> [6] "dose 30 1.167 0.634 0.500 2.000 "
#> [7] "-----------------------------------------"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With