<p>I would like to use stargazer to produce summary statistics for each category of a grouping variable. I could do it in separate tables, but I'd like it all in one – if that is not unreasonably challenging for this package.</p> <p>For example</p> <pre class="prettyprint"><code>library(stargazer) stargazer(ToothGrowth, type = "text") #> #> ========================================= #> Statistic N Mean St. Dev. Min Max #> ----------------------------------------- #> len 60 18.813 7.649 4.200 33.900 #> dose 60 1.167 0.629 0.500 2.000 #> ----------------------------------------- </code></pre> <p>provides summary statistics for the continuous variables in <code>ToothGrowth</code>. I would like to split that summary by the categorical variable <code>supp</code>, also in <code>ToothGrowth</code>.</p> <p>Two suggestions for <em>desired outcome</em>,</p> <pre class="prettyprint"><code>stargazer(ToothGrowth ~ supp, type = "text") #> #> ================================================== #> Statistic N Mean St. Dev. Min Max #> -------------------------------------------------- #> OJ len 30 16.963 8.266 4.200 33.900 #> dose 30 1.167 0.634 0.500 2.000 #> VC len 30 20.663 6.606 8.200 30.900 #> dose 30 1.167 0.634 0.500 2.000 #> -------------------------------------------------- #> stargazer(ToothGrowth ~ supp, type = "text") #> #> ================================================== #> Statistic N Mean St. Dev. Min Max #> -------------------------------------------------- #> len #> _by VC 30 16.963 8.266 4.200 33.900 #> _by VC 30 1.167 0.634 0.500 2.000 #> _tot 60 18.813 7.649 4.200 33.900 #> #> dose #> _by OJ 30 20.663 6.606 8.200 30.900 #> _by OJ 30 1.167 0.634 0.500 2.000 #> _tot 60 1.167 0.629 0.500 2.000 #> -------------------------------------------------- </code></pre>

<p>Three possible solution. One using reporttools and xtable, one using tidyverse tools along with stargazer, and third a base-r solution.</p> <h3>First,</h3> <p>I want to suggest you take a look at reporttools which is kinda leaving stargazer, but I think you should take a look at it,</p> <pre class="prettyprint"><code># install.packages("reporttools") #Use this to install it, do this only once require(reporttools) vars <- ToothGrowth[,c('len','dose')] group <- ToothGrowth[,c('supp')] ## display default statistics, only use a subset of observations, grouped analysis tableContinuous(vars = vars, group = group, prec = 1, cap = "Table of 'len','dose' by 'supp' ", lab = "tab: descr stat") % latex table generated in R 3.3.3 by xtable 1.8-2 package \begingroup\footnotesize \begin{longtable}{llrrrrrrrrrr} \textbf{Variable} & \textbf{Levels} & $\mathbf{n}$ & \textbf{Min} & $\mathbf{q_1}$ & $\mathbf{\widetilde{x}}$ & $\mathbf{\bar{x}}$ & $\mathbf{q_3}$ & \textbf{Max} & $\mathbf{s}$ & \textbf{IQR} & \textbf{\#NA} \\ \hline len & OJ & 30 & 8.2 & 15.5 & 22.7 & 20.7 & 25.7 & 30.9 & 6.6 & 10.2 & 0 \\ & VC & 30 & 4.2 & 11.2 & 16.5 & 17.0 & 23.1 & 33.9 & 8.3 & 11.9 & 0 \\ \hline & all & 60 & 4.2 & 13.1 & 19.2 & 18.8 & 25.3 & 33.9 & 7.6 & 12.2 & 0 \\ \hline dose & OJ & 30 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\ & VC & 30 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\ \hline & all & 60 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\ \hline \hline \caption{Table of 'len','dose' by 'supp' } \label{tab: descr stat} \end{longtable} \endgroup </code></pre> <p>in latex you get this nice result, <img src="https://i.stack.imgur.com/oz3Lv.png" alt="Latex with reporttools"></p> <h3>Second,</h3> <p>using tidyverse tools along with stargazer, inspired by this SO answer, </p> <pre class="prettyprint"><code># install.packages(c("tidyverse"), dependencies = TRUE) library(dplyr); library(purrr) #> ToothGrowth %>% split(. $supp) %>% walk(~ stargazer(., type = "text")) #> ========================================= #> Statistic N Mean St. Dev. Min Max #> ----------------------------------------- #> len 30 20.663 6.606 8.200 30.900 #> dose 30 1.167 0.634 0.500 2.000 #> ----------------------------------------- #> ========================================= #> Statistic N Mean St. Dev. Min Max #> ----------------------------------------- #> len 30 16.963 8.266 4.200 33.900 #> dose 30 1.167 0.634 0.500 2.000 #> ----------------------------------------- #> </code></pre> <h3>Third,</h3> <p>an exclusive base-r</p> <pre class="prettyprint"><code>by(ToothGrowth, ToothGrowth$supp, stargazer, type = 'text') #> ========================================= #> Statistic N Mean St. Dev. Min Max #> ----------------------------------------- #> len 30 20.663 6.606 8.200 30.900 #> dose 30 1.167 0.634 0.500 2.000 #> ----------------------------------------- #> #> ========================================= #> Statistic N Mean St. Dev. Min Max #> ----------------------------------------- #> len 30 16.963 8.266 4.200 33.900 #> dose 30 1.167 0.634 0.500 2.000 #> ----------------------------------------- #> ToothGrowth$supp: OJ #> [1] "" #> [2] "=========================================" #> [3] "Statistic N Mean St. Dev. Min Max " #> [4] "-----------------------------------------" #> [5] "len 30 20.663 6.606 8.200 30.900" #> [6] "dose 30 1.167 0.634 0.500 2.000 " #> [7] "-----------------------------------------" #> --------------------------------------------------------------- #> ToothGrowth$supp: VC #> [1] "" #> [2] "=========================================" #> [3] "Statistic N Mean St. Dev. Min Max " #> [4] "-----------------------------------------" #> [5] "len 30 16.963 8.266 4.200 33.900" #> [6] "dose 30 1.167 0.634 0.500 2.000 " #> [7] "-----------------------------------------" </code></pre>

Obtaining Separate Summary Statistics by Categorical Variable with Stargazer Package

Tags:

r

summary

stargazer

I would like to use stargazer to produce summary statistics for each category of a grouping variable. I could do it in separate tables, but I'd like it all in one – if that is not unreasonably challenging for this package.

For example

library(stargazer)
stargazer(ToothGrowth, type = "text")
#> 
#> =========================================
#> Statistic N   Mean  St. Dev.  Min   Max  
#> -----------------------------------------
#> len       60 18.813  7.649   4.200 33.900
#> dose      60 1.167   0.629   0.500 2.000 
#> -----------------------------------------

provides summary statistics for the continuous variables in ToothGrowth. I would like to split that summary by the categorical variable supp, also in ToothGrowth.

Two suggestions for desired outcome,

stargazer(ToothGrowth ~ supp, type = "text")
#> 
#> ==================================================
#> Statistic         N   Mean   St. Dev.  Min   Max  
#> --------------------------------------------------
#> OJ       len       30 16.963  8.266   4.200 33.900
#>          dose      30  1.167  0.634   0.500  2.000
#> VC       len       30 20.663  6.606   8.200 30.900
#>          dose      30  1.167  0.634   0.500  2.000 
#> --------------------------------------------------
#> 
 stargazer(ToothGrowth ~ supp, type = "text")
#> 
#> ==================================================
#> Statistic          N   Mean   St. Dev.  Min   Max  
#> --------------------------------------------------
#> len               
#>        _by VC     30 16.963  8.266   4.200 33.900
#>        _by VC     30  1.167  0.634   0.500  2.000
#> _tot              60 18.813  7.649   4.200 33.900
#> 
#> dose             
#>        _by OJ     30 20.663  6.606   8.200 30.900
#>        _by OJ     30  1.167  0.634   0.500  2.000 
#> _tot              60 1.167   0.629   0.500 2.000         
#> --------------------------------------------------

347

asked Aug 19 '14 17:08

Michael

2 Answers

Solution

library(stargazer)
library(dplyr)
library(tidyr)

ToothGrowth %>%
    group_by(supp) %>%
    mutate(id = 1:n()) %>%
    ungroup() %>%
    gather(temp, val, len, dose) %>%
    unite(temp1, supp, temp, sep = '_') %>%
    spread(temp1, val) %>%
    select(-id) %>%
    as.data.frame() %>%
    stargazer(type = 'text')

Result

=========================================
Statistic N   Mean  St. Dev.  Min   Max  
-----------------------------------------
OJ_dose   30 1.167   0.634   0.500 2.000 
OJ_len    30 20.663  6.606   8.200 30.900
VC_dose   30 1.167   0.634   0.500 2.000 
VC_len    30 16.963  8.266   4.200 33.900
-----------------------------------------

Explanation

This gets rid of the problem mentioned by the OP in a comment to the original answer, "What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables." The easiest way I saw to do that with stargazer was to create a new data frame that had variables for each group's observations using a gather(), unite(), spread() strategy. The only trick to it is to avoid duplicate identifiers by creating unique identifiers by group and dropping that variable before calling stargazer().

148

answered Oct 30 '22 05:10

duckmayr

Three possible solution. One using reporttools and xtable, one using tidyverse tools along with stargazer, and third a base-r solution.

First,

I want to suggest you take a look at reporttools which is kinda leaving stargazer, but I think you should take a look at it,

# install.packages("reporttools")  #Use this to install it, do this only once
require(reporttools)

vars <- ToothGrowth[,c('len','dose')]
group <- ToothGrowth[,c('supp')]

## display default statistics, only use a subset of observations, grouped analysis
tableContinuous(vars = vars, group = group, prec = 1, cap = "Table of 'len','dose' by 'supp' ", lab = "tab: descr stat")

% latex table generated in R 3.3.3 by xtable 1.8-2 package
\begingroup\footnotesize
\begin{longtable}{llrrrrrrrrrr}
 \textbf{Variable} & \textbf{Levels} & $\mathbf{n}$ & \textbf{Min} & $\mathbf{q_1}$ & $\mathbf{\widetilde{x}}$ & $\mathbf{\bar{x}}$ & $\mathbf{q_3}$ & \textbf{Max} & $\mathbf{s}$ & \textbf{IQR} & \textbf{\#NA} \\ 
  \hline
len & OJ & 30 & 8.2 & 15.5 & 22.7 & 20.7 & 25.7 & 30.9 & 6.6 & 10.2 & 0 \\ 
   & VC & 30 & 4.2 & 11.2 & 16.5 & 17.0 & 23.1 & 33.9 & 8.3 & 11.9 & 0 \\ 
   \hline
 & all & 60 & 4.2 & 13.1 & 19.2 & 18.8 & 25.3 & 33.9 & 7.6 & 12.2 & 0 \\ 
   \hline
dose & OJ & 30 & 0.5 &  0.5 &  1.0 &  1.2 &  2.0 &  2.0 & 0.6 &  1.5 & 0 \\ 
   & VC & 30 & 0.5 &  0.5 &  1.0 &  1.2 &  2.0 &  2.0 & 0.6 &  1.5 & 0 \\ 
   \hline
 & all & 60 & 0.5 &  0.5 &  1.0 &  1.2 &  2.0 &  2.0 & 0.6 &  1.5 & 0 \\ 
   \hline
\hline
\caption{Table of 'len','dose' by 'supp' } 
\label{tab: descr stat}
\end{longtable}
\endgroup

in latex you get this nice result, Latex with reporttools

Second,

using tidyverse tools along with stargazer, inspired by this SO answer,

# install.packages(c("tidyverse"), dependencies = TRUE)
library(dplyr); library(purrr)
#> ToothGrowth %>% split(. $supp) %>% walk(~ stargazer(., type = "text"))
#> =========================================
#> Statistic N   Mean  St. Dev.  Min   Max  
#> -----------------------------------------
#> len       30 20.663  6.606   8.200 30.900
#> dose      30 1.167   0.634   0.500 2.000 
#> -----------------------------------------
#> =========================================
#> Statistic N   Mean  St. Dev.  Min   Max  
#> -----------------------------------------
#> len       30 16.963  8.266   4.200 33.900
#> dose      30 1.167   0.634   0.500 2.000 
#> -----------------------------------------
#>

Third,

an exclusive base-r

by(ToothGrowth, ToothGrowth$supp, stargazer, type = 'text')
    #> =========================================
    #> Statistic N   Mean  St. Dev.  Min   Max  
    #> -----------------------------------------
    #> len       30 20.663  6.606   8.200 30.900
    #> dose      30 1.167   0.634   0.500 2.000 
    #> -----------------------------------------
    #> 
    #> =========================================
    #> Statistic N   Mean  St. Dev.  Min   Max  
    #> -----------------------------------------
    #> len       30 16.963  8.266   4.200 33.900
    #> dose      30 1.167   0.634   0.500 2.000 
    #> -----------------------------------------
    #> ToothGrowth$supp: OJ
    #> [1] ""                                         
    #> [2] "========================================="
    #> [3] "Statistic N   Mean  St. Dev.  Min   Max  "
    #> [4] "-----------------------------------------"
    #> [5] "len       30 20.663  6.606   8.200 30.900"
    #> [6] "dose      30 1.167   0.634   0.500 2.000 "
    #> [7] "-----------------------------------------"
    #> --------------------------------------------------------------- 
    #> ToothGrowth$supp: VC
    #> [1] ""                                         
    #> [2] "========================================="
    #> [3] "Statistic N   Mean  St. Dev.  Min   Max  "
    #> [4] "-----------------------------------------"
    #> [5] "len       30 16.963  8.266   4.200 33.900"
    #> [6] "dose      30 1.167   0.634   0.500 2.000 "
    #> [7] "-----------------------------------------"

answered Oct 30 '22 04:10

Eric Fail

Related questions
                            
                                StatET in Eclipse and R
                            
                                Handling missing combinations of factors in R
                            
                                Custom Function, ggplot and return values
                            
                                Numeric data frame columns order incorrectly as string
                            
                                Is there a way to paste documented R code into R console or Rstudio without the arrow or plus signs being registered?
                            
                                Subsetting a data.frame with an integer matrix
                            
                                Parsing ISO8601 date and time format in R [duplicate]
                            
                                Unlist all list elements in a dataframe
                            
                                Collapse intersecting regions
                            
                                fread from data.table package when column names include spaces and special characters?
                            
                                Cannot locate .Rprofile file [duplicate]
                            
                                Put a fixed title in an interactive 3D plot using rgl package, R
                            
                                Kronecker product for large matrices
                            
                                Possible to combine position_jitter with position_dodge?
                            
                                Scatter plot with ggplot2 colored by dates
                            
                                R: Dimension names in tables and multi-dimensional arrays
                            
                                BUGS error messages
                            
                                How to print three venn diagrams in the same window
                            
                                Efficient R code for finding indices associated with unique values in vector
                            
                                Combine/merge lists by elements names (list in list)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With