Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stargazer organize summary statistics by group (r) [duplicate]

I would like to use stargazer to produce summary statistics for each category of a grouping variable. I could do it in separate tables, but I'd like it all in one – if that is not unreasonably challenging for this package.

For example

library(stargazer)
stargazer(ToothGrowth, type = "text")
#> 
#> =========================================
#> Statistic N   Mean  St. Dev.  Min   Max  
#> -----------------------------------------
#> len       60 18.813  7.649   4.200 33.900
#> dose      60 1.167   0.629   0.500 2.000 
#> -----------------------------------------

provides summary statistics for the continuous variables in ToothGrowth. I would like to split that summary by the categorical variable supp, also in ToothGrowth.

Two suggestions for desired outcome,

stargazer(ToothGrowth ~ supp, type = "text")
#> 
#> ==================================================
#> Statistic         N   Mean   St. Dev.  Min   Max  
#> --------------------------------------------------
#> OJ       len       30 16.963  8.266   4.200 33.900
#>          dose      30  1.167  0.634   0.500  2.000
#> VC       len       30 20.663  6.606   8.200 30.900
#>          dose      30  1.167  0.634   0.500  2.000 
#> --------------------------------------------------
#> 
 stargazer(ToothGrowth ~ supp, type = "text")
#> 
#> ==================================================
#> Statistic          N   Mean   St. Dev.  Min   Max  
#> --------------------------------------------------
#> len               
#>        _by VC     30 16.963  8.266   4.200 33.900
#>        _by VC     30  1.167  0.634   0.500  2.000
#> _tot              60 18.813  7.649   4.200 33.900
#> 
#> dose             
#>        _by OJ     30 20.663  6.606   8.200 30.900
#>        _by OJ     30  1.167  0.634   0.500  2.000 
#> _tot              60 1.167   0.629   0.500 2.000         
#> --------------------------------------------------
like image 512
Michael Avatar asked Nov 22 '22 10:11

Michael


1 Answers

Solution

library(stargazer)
library(dplyr)
library(tidyr)

ToothGrowth %>%
    group_by(supp) %>%
    mutate(id = 1:n()) %>%
    ungroup() %>%
    gather(temp, val, len, dose) %>%
    unite(temp1, supp, temp, sep = '_') %>%
    spread(temp1, val) %>%
    select(-id) %>%
    as.data.frame() %>%
    stargazer(type = 'text')

Result

=========================================
Statistic N   Mean  St. Dev.  Min   Max  
-----------------------------------------
OJ_dose   30 1.167   0.634   0.500 2.000 
OJ_len    30 20.663  6.606   8.200 30.900
VC_dose   30 1.167   0.634   0.500 2.000 
VC_len    30 16.963  8.266   4.200 33.900
-----------------------------------------

Explanation

This gets rid of the problem mentioned by the OP in a comment to the original answer, "What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables." The easiest way I saw to do that with stargazer was to create a new data frame that had variables for each group's observations using a gather(), unite(), spread() strategy. The only trick to it is to avoid duplicate identifiers by creating unique identifiers by group and dropping that variable before calling stargazer().

like image 50
duckmayr Avatar answered Nov 23 '22 23:11

duckmayr