Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Output each factor level as dummy variable in stargazer summary statistics table

I'm using the R package stargazer to create high-quality regression tables, and I would like to use it to create a summary statistics table. I have a factor variable in my data, and I would like the summary table to show me the percent in each category of the factor -- in effect, separate the factor into a set of mutually exclusive logical (dummy) variables, and then display those in the table. Here's an example:

> library(car)
> library(stargazer)
> data(Blackmore)
> stargazer(Blackmore[, c("age", "exercise", "group")], type = "text")

Statistic  N   Mean  St. Dev.  Min   Max  
age       945 11.442  2.766   8.000 17.920
exercise  945 2.531   3.495   0.000 29.960

But I'm trying to get an additional row that shows me the percent in each group (% control and/or % patient, in these data). I'm sure this is just an option somewhere in stargazer, but I can't find it. Does anyone know what it is?

Edit: car::Blackmoor has updated spelling to car::Blackmore.

like image 309
Jake Fisher Avatar asked Nov 13 '14 15:11

Jake Fisher

People also ask

How can I incorporate categorical variables into a summary statistics table?

If you have categorical variables, you can generally still incorporate them into a summary statistics table by turning them into binary “dummy” variables.

How do I create a summary statistics table in R?

Probably the most straightforward and simplest way to do a summary statistics table in R is with the sumtable function in the vtable package, which also has many options for customization. There are also other options like stargazer in stargazer, dfsummary () in summarytools, summary_table () in qwraps2 or table1 () in table1.

How do I create a summary statistics table in Stata?

The built-in Stata command summarize (which can be referred to in short as su or summ) easily creates summary statistics tables.

3 Answers

Since Stargazer can't do this directly, you can create your own summary table as a data frame and output that using pander, xtable, or any other package. For example, here's how you can use dplyr and tidyr to create a summary table:


fancy.summary <- Blackmoor %>%
  select(-subject) %>%  # Remove the subject column
  group_by(group) %>%  # Group by patient and control
  summarise_each(funs(mean, sd, min, max, length)) %>%  # Calculate summary statistics for each group
  mutate(prop = age_length / sum(age_length)) %>%  # Calculate proportion
  gather(variable, value, -group, -prop) %>%  # Convert to long
  separate(variable, c("variable", "statistic")) %>%  # Split variable column
  mutate(statistic = ifelse(statistic == "length", "n", statistic)) %>%
  spread(statistic, value) %>%  # Make the statistics be actual columns
  select(group, variable, n, mean, sd, min, max, prop)  # Reorder columns

Which results in this if you use pander:



 group   variable   n   mean   sd    min   max   prop 
------- ---------- --- ------ ----- ----- ----- ------
control    age     359 11.26  2.698   8   17.92 0.3799

control  exercise  359 1.641  1.813   0   11.54 0.3799

patient    age     586 11.55  2.802   8   17.92 0.6201

patient  exercise  586 3.076  4.113   0   29.96 0.6201
like image 155
Andrew Avatar answered Nov 15 '22 00:11


Another workaround is to use model.matrix to create dummy variables in a separate step, and then use stargazer to create a table from that. To show this with the example:

> library(car)
> library(stargazer)
> data(Blackmore)
> options(na.action = "na.pass")  # so that we keep missing values in the data
> X <- model.matrix(~ age + exercise + group - 1, data = Blackmore)
> X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
> names(X) <- colnames(X)
> stargazer(X.df, type = "text")

Statistic     N   Mean  St. Dev.  Min   Max  
age          945 11.442  2.766   8.000 17.920
exercise     945 2.531   3.495   0.000 29.960
groupcontrol 945 0.380   0.486     0     1   
grouppatient 945 0.620   0.486     0     1   

Edit: car::Blackmoor has updated spelling to car::Blackmore.

like image 29
Jake Fisher Avatar answered Nov 14 '22 23:11

Jake Fisher

The package tables can be useful for this task.


# percent only:
(x <- tabular((Factor(group, "") ) ~ (Pct=Percent()) * Format(digits=4), 
##         Pct  
## control 37.99
## patient 62.01

# percent and counts:
(x <- tabular((Factor(group, "") ) ~ ((n=1) + (Pct=Percent())) * Format(digits=4), 
##         n      Pct   
## control 359.00  37.99
## patient 586.00  62.01

Then it's straightforward to output this to LaTeX:

> latex(x)
  & n & \multicolumn{1}{c}{Pct} \\ 
control  & $359.00$ & $\phantom{0}37.99$ \\
patient  & $586.00$ & $\phantom{0}62.01$ \\
like image 31
landroni Avatar answered Nov 15 '22 01:11
