Recode summery/overview of levels before and after recoding

I have dplyr::recode some factors and I am looking for a clean way to make LaTeX table where new and old categories, i.e. levels, are compared.

Here's an illustration of the issues using cyl from `mtcars. First some packages,

# install.packages("tidyverse", "stargazer","reporttools") 

and the data I intend to use,

mcr <- mtcars %>% select(cyl) %>% as_tibble() 
mcr %>% print(n=5)
#> # A tibble: 32 x 1
#>     cyl
#> * <dbl>
#> 1  6.00
#> 2  6.00
#> 3  4.00
#> 4  6.00
#> 5  8.00
#> # ... with 27 more rows

Now, I create two new factor, one with 3 categories, cyl_3col, and one with two, cyl_is_red, i.e.:

mcr_col <- mcr %>% as_tibble() %>%
    mutate(cyl_3col = factor(cyl, levels = c(4, 6, 8),labels = c("red", "blue", "green")),
           cyl_is_red = recode(cyl_3col, .default = 'is not red', 'red' = 'is red'))
mcr_col  %>% print(n=5)
#> # A tibble: 32 x 3
#>     cyl cyl_3col cyl_is_red
#>   <dbl> <fct>    <fct>     
#> 1  6.00 blue     is not red
#> 2  6.00 blue     is not red
#> 3  4.00 red      is red    
#> 4  6.00 blue     is not red
#> 5  8.00 green    is not red
#> # ... with 27 more rows

Now, I would like to show how the categories in cyl_3col and cyl_is_red are related.

Maybe something like this is better,

#> cyl_is_red  cyl_3col 
#> is red               
#>             red      
#> is not red           
#>             blue     
#>             green    

possible something like this, I imagine the is not red category spanning two rows with \multirow{} or something like it.

#>  cyl_3col   cyl_is_red
#> 1 red       is red    
#> 2 blue      is not red
#> 3 green     ----------

using stargazer or possibly some other TeX tool. I am very open as to how I can best show the recoding. I assume there's some smart way to code this thought out by someone who came before me?

I've used something like mcr_col %>% count(cyl_3col, cyl_is_red) for now, but I don't think it's really working.

3 Answers

pixiedust has a merge option.

title: "Untitled"
output: pdf_document
- \usepackage{amssymb} 
- \usepackage{arydshln} 
- \usepackage{caption} 
- \usepackage{graphicx} 
- \usepackage{hhline} 
- \usepackage{longtable} 
- \usepackage{multirow} 
- \usepackage[dvipsnames,table]{xcolor} 


mcr <- mtcars %>% select(cyl) %>% as_tibble() 
mcr_col <- mcr %>% as_tibble() %>%
  mutate(cyl_3col = factor(cyl, levels = c(4, 6, 8),labels = c("red", "blue", "green")),
         cyl_is_red = recode(cyl_3col, .default = 'is not red', 'red' = 'is red'))

mcr_col %>% 
  count(cyl_3col, cyl_is_red) %>% 
  select(-n) %>% 
  dust(float = FALSE) %>% 
  sprinkle(cols = "cyl_is_red",
           rows = 2:3,
           merge = TRUE) %>% 
  sprinkle(sanitize = TRUE,
           part = "head")

enter image description here

Maybe a somewhat different way of tackling the problem would be to display the recodings as a plot rather than a table -- in this way circumventing generating latex syntax. You could do something like:

# Here I make some data with lots of levels
tdf <- data.frame(cat1 = factor(letters), 
                  cat2 = factor(c(rep("Low", 9), rep("Mid", 9), rep("High", 8))))
# We'll collapse the alphabet down to three factors
tdf$cat2 <- factor(tdf$cat2, levels(tdf$cat2)[c(2,3,1)])

# Now plot it as arrows running from the first encoding to the second
ggplot2::ggplot(tdf) + 
  geom_segment(data=tdf, aes(x=.05, xend = .45, y = cat1, yend = cat2), arrow = arrow()) + 
  geom_text(aes(x=0, y=cat1, label=cat1)) + 
  geom_text(aes(x=.5, y=cat2, label=cat2))+ 
  facet_wrap(~cat2, nrow = 3, scales = "free_y") + 
        axis.line = element_blank(),
        strip.background = element_blank(),
        strip.text.y = element_blank()) +
  ggtitle("Variable Recodings")

enter image description here

With lots of variables this might be easier on the reader's eyes.

If HTML works for you instead of latex, then you might find many options with the library tableHTML

here is an example of something you can do with it:


connections <- mcr_col %>% 
  count(cyl_3col, cyl_is_red) 

groups <- connections %>% 
  group_by(cyl_is_red) %>% 
  summarise(cnt = length(cyl_3col))

tableHTML(connections %>% 
            select(-n, -cyl_is_red), 
          rownames = FALSE,
          row_groups = list(groups$cnt, groups$cyl_is_red))
