I'm using the R package stargazer to create high-quality regression tables, and I would like to use it to create a summary statistics table. I have a factor variable in my data, and I would like the summary table to show me the percent in each category of the factor -- in effect, separate the factor into a set of mutually exclusive logical (dummy) variables, and then display those in the table. Here's an example: <pre class="prettyprint"><code>> library(car) > library(stargazer) > data(Blackmore) > stargazer(Blackmore[, c("age", "exercise", "group")], type = "text") ========================================== Statistic N Mean St. Dev. Min Max ------------------------------------------ age 945 11.442 2.766 8.000 17.920 exercise 945 2.531 3.495 0.000 29.960 ------------------------------------------ </code></pre> But I'm trying to get an additional row that shows me the percent in each group (% control and/or % patient, in these data). I'm sure this is just an option somewhere in stargazer, but I can't find it. Does anyone know what it is? Edit: <code>car::Blackmoor</code> has updated spelling to <code>car::Blackmore</code>.

Since Stargazer can't do this directly, you can create your own summary table as a data frame and output that using pander, xtable, or any other package. For example, here's how you can use dplyr and tidyr to create a summary table: <pre class="prettyprint"><code>library(dplyr) library(tidyr) fancy.summary <- Blackmoor %>% select(-subject) %>% # Remove the subject column group_by(group) %>% # Group by patient and control summarise_each(funs(mean, sd, min, max, length)) %>% # Calculate summary statistics for each group mutate(prop = age_length / sum(age_length)) %>% # Calculate proportion gather(variable, value, -group, -prop) %>% # Convert to long separate(variable, c("variable", "statistic")) %>% # Split variable column mutate(statistic = ifelse(statistic == "length", "n", statistic)) %>% spread(statistic, value) %>% # Make the statistics be actual columns select(group, variable, n, mean, sd, min, max, prop) # Reorder columns </code></pre> Which results in this if you use pander: <pre class="prettyprint"><code>library(pander) pandoc.table(fancy.summary) ------------------------------------------------------ group variable n mean sd min max prop ------- ---------- --- ------ ----- ----- ----- ------ control age 359 11.26 2.698 8 17.92 0.3799 control exercise 359 1.641 1.813 0 11.54 0.3799 patient age 586 11.55 2.802 8 17.92 0.6201 patient exercise 586 3.076 4.113 0 29.96 0.6201 ------------------------------------------------------ </code></pre>

Another workaround is to use <code>model.matrix</code> to create dummy variables in a separate step, and then use <code>stargazer</code> to create a table from that. To show this with the example: <pre class="prettyprint"><code>> library(car) > library(stargazer) > data(Blackmore) > > options(na.action = "na.pass") # so that we keep missing values in the data > X <- model.matrix(~ age + exercise + group - 1, data = Blackmore) > X.df <- data.frame(X) # stargazer only does summary tables of data.frame objects > names(X) <- colnames(X) > stargazer(X.df, type = "text") ============================================= Statistic N Mean St. Dev. Min Max --------------------------------------------- age 945 11.442 2.766 8.000 17.920 exercise 945 2.531 3.495 0.000 29.960 groupcontrol 945 0.380 0.486 0 1 grouppatient 945 0.620 0.486 0 1 --------------------------------------------- </code></pre> Edit: <code>car::Blackmoor</code> has updated spelling to <code>car::Blackmore</code>.

Output each factor level as dummy variable in stargazer summary statistics table

Tags:

r

regression

stargazer

I'm using the R package stargazer to create high-quality regression tables, and I would like to use it to create a summary statistics table. I have a factor variable in my data, and I would like the summary table to show me the percent in each category of the factor -- in effect, separate the factor into a set of mutually exclusive logical (dummy) variables, and then display those in the table. Here's an example:

> library(car)
> library(stargazer)
> data(Blackmore)
> stargazer(Blackmore[, c("age", "exercise", "group")], type = "text")

==========================================
Statistic  N   Mean  St. Dev.  Min   Max  
------------------------------------------
age       945 11.442  2.766   8.000 17.920
exercise  945 2.531   3.495   0.000 29.960
------------------------------------------

But I'm trying to get an additional row that shows me the percent in each group (% control and/or % patient, in these data). I'm sure this is just an option somewhere in stargazer, but I can't find it. Does anyone know what it is?

Edit: car::Blackmoor has updated spelling to car::Blackmore.

309

asked Nov 13 '14 15:11

Jake Fisher

3 Answers

Since Stargazer can't do this directly, you can create your own summary table as a data frame and output that using pander, xtable, or any other package. For example, here's how you can use dplyr and tidyr to create a summary table:

library(dplyr)
library(tidyr)

fancy.summary <- Blackmoor %>%
  select(-subject) %>%  # Remove the subject column
  group_by(group) %>%  # Group by patient and control
  summarise_each(funs(mean, sd, min, max, length)) %>%  # Calculate summary statistics for each group
  mutate(prop = age_length / sum(age_length)) %>%  # Calculate proportion
  gather(variable, value, -group, -prop) %>%  # Convert to long
  separate(variable, c("variable", "statistic")) %>%  # Split variable column
  mutate(statistic = ifelse(statistic == "length", "n", statistic)) %>%
  spread(statistic, value) %>%  # Make the statistics be actual columns
  select(group, variable, n, mean, sd, min, max, prop)  # Reorder columns

Which results in this if you use pander:

library(pander)

pandoc.table(fancy.summary)

------------------------------------------------------
 group   variable   n   mean   sd    min   max   prop 
------- ---------- --- ------ ----- ----- ----- ------
control    age     359 11.26  2.698   8   17.92 0.3799

control  exercise  359 1.641  1.813   0   11.54 0.3799

patient    age     586 11.55  2.802   8   17.92 0.6201

patient  exercise  586 3.076  4.113   0   29.96 0.6201
------------------------------------------------------

155

answered Nov 15 '22 00:11

Andrew

Another workaround is to use model.matrix to create dummy variables in a separate step, and then use stargazer to create a table from that. To show this with the example:

> library(car)
> library(stargazer)
> data(Blackmore)
> 
> options(na.action = "na.pass")  # so that we keep missing values in the data
> X <- model.matrix(~ age + exercise + group - 1, data = Blackmore)
> X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
> names(X) <- colnames(X)
> stargazer(X.df, type = "text")

=============================================
Statistic     N   Mean  St. Dev.  Min   Max  
---------------------------------------------
age          945 11.442  2.766   8.000 17.920
exercise     945 2.531   3.495   0.000 29.960
groupcontrol 945 0.380   0.486     0     1   
grouppatient 945 0.620   0.486     0     1   
---------------------------------------------

Edit: car::Blackmoor has updated spelling to car::Blackmore.

answered Nov 14 '22 23:11

Jake Fisher

The package tables can be useful for this task.

library(car)
library(tables)
data(Blackmore)

# percent only:
(x <- tabular((Factor(group, "") ) ~ (Pct=Percent()) * Format(digits=4), 
    data=Blackmore))
##              
##         Pct  
## control 37.99
## patient 62.01

# percent and counts:
(x <- tabular((Factor(group, "") ) ~ ((n=1) + (Pct=Percent())) * Format(digits=4), 
    data=Blackmore))
##                      
##         n      Pct   
## control 359.00  37.99
## patient 586.00  62.01

Then it's straightforward to output this to LaTeX:

> latex(x)
\begin{tabular}{lcc}
\hline
  & n & \multicolumn{1}{c}{Pct} \\ 
\hline
control  & $359.00$ & $\phantom{0}37.99$ \\
patient  & $586.00$ & $\phantom{0}62.01$ \\
\hline 
\end{tabular}

answered Nov 15 '22 01:11

landroni

Related questions
                            
                                How to combine multiple chains from rjags into one chain in R?
                            
                                Greek letters in ggplot annotate
                            
                                Difference between sum(), length(which()), and nrow() in R
                            
                                ggplot2: center legend below plot instead of panel area
                            
                                putting `mclapply` results back onto data.frame
                            
                                Print a web page from within R
                            
                                HTML outputs are different between using knitr in Rstudio & knit2html in command line
                            
                                Using Rcpp function in parLapply on Windows
                            
                                Stopping an R script without getting "Error during wrapup" message
                            
                                R Lattice like plots with Python, Pandas and Matplotlib
                            
                                How to use all features in rpart?
                            
                                Using dplyr summarise_each() with is.na()
                            
                                R read comma delimited txt file with comma inside one column
                            
                                Limit Output of Function in Rstudio (3.1.1) when Knitting to PDF
                            
                                Protect user credentials when connecting R with databases using JDBC/ODBC drivers
                            
                                How to combine state-level shapefiles from the united states census bureau into a nationwide shape
                            
                                How to restart a sequence based on values in another column OR reference the previous column's value in R
                            
                                knitr called from RStudio does not preserve the order in which packages are loaded
                            
                                What do ..1 and ..2 stand for in R? [duplicate]
                            
                                How to merge two large datasets while generate new column with different repeat value in r

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Output each factor level as dummy variable in stargazer summary statistics table

Tags:

r

regression

stargazer

Jake Fisher

People also ask

3 Answers

Andrew

Jake Fisher

landroni

Recent Activity

Donate For Us