Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recode summery/overview of levels before and after recoding

I have dplyr::recode some factors and I am looking for a clean way to make LaTeX table where new and old categories, i.e. levels, are compared.

Here's an illustration of the issues using cyl from `mtcars. First some packages,

# install.packages("tidyverse", "stargazer","reporttools") 
library(tidyverse) 

and the data I intend to use,

mcr <- mtcars %>% select(cyl) %>% as_tibble() 
mcr %>% print(n=5)
#> # A tibble: 32 x 1
#>     cyl
#> * <dbl>
#> 1  6.00
#> 2  6.00
#> 3  4.00
#> 4  6.00
#> 5  8.00
#> # ... with 27 more rows

Now, I create two new factor, one with 3 categories, cyl_3col, and one with two, cyl_is_red, i.e.:

mcr_col <- mcr %>% as_tibble() %>%
    mutate(cyl_3col = factor(cyl, levels = c(4, 6, 8),labels = c("red", "blue", "green")),
           cyl_is_red = recode(cyl_3col, .default = 'is not red', 'red' = 'is red'))
mcr_col  %>% print(n=5)
#> # A tibble: 32 x 3
#>     cyl cyl_3col cyl_is_red
#>   <dbl> <fct>    <fct>     
#> 1  6.00 blue     is not red
#> 2  6.00 blue     is not red
#> 3  4.00 red      is red    
#> 4  6.00 blue     is not red
#> 5  8.00 green    is not red
#> # ... with 27 more rows

Now, I would like to show how the categories in cyl_3col and cyl_is_red are related.

Maybe something like this is better,

#> cyl_is_red  cyl_3col 
#> is red               
#>             red      
#> is not red           
#>             blue     
#>             green    

possible something like this, I imagine the is not red category spanning two rows with \multirow{} or something like it.

#>  cyl_3col   cyl_is_red
#> 1 red       is red    
#> 2 blue      is not red
#> 3 green     ----------

using stargazer or possibly some other TeX tool. I am very open as to how I can best show the recoding. I assume there's some smart way to code this thought out by someone who came before me?

I've used something like mcr_col %>% count(cyl_3col, cyl_is_red) for now, but I don't think it's really working.

like image 647
Eric Fail Avatar asked Feb 26 '18 12:02

Eric Fail


3 Answers

pixiedust has a merge option.

---
title: "Untitled"
output: pdf_document
header-includes: 
- \usepackage{amssymb} 
- \usepackage{arydshln} 
- \usepackage{caption} 
- \usepackage{graphicx} 
- \usepackage{hhline} 
- \usepackage{longtable} 
- \usepackage{multirow} 
- \usepackage[dvipsnames,table]{xcolor} 
---

```{r}
library(pixiedust)
library(dplyr)

mcr <- mtcars %>% select(cyl) %>% as_tibble() 
mcr_col <- mcr %>% as_tibble() %>%
  mutate(cyl_3col = factor(cyl, levels = c(4, 6, 8),labels = c("red", "blue", "green")),
         cyl_is_red = recode(cyl_3col, .default = 'is not red', 'red' = 'is red'))

mcr_col %>% 
  count(cyl_3col, cyl_is_red) %>% 
  select(-n) %>% 
  dust(float = FALSE) %>% 
  sprinkle(cols = "cyl_is_red",
           rows = 2:3,
           merge = TRUE) %>% 
  sprinkle(sanitize = TRUE,
           part = "head")
```

enter image description here

like image 79
Benjamin Avatar answered Nov 03 '22 19:11

Benjamin


Maybe a somewhat different way of tackling the problem would be to display the recodings as a plot rather than a table -- in this way circumventing generating latex syntax. You could do something like:

# Here I make some data with lots of levels
tdf <- data.frame(cat1 = factor(letters), 
                  cat2 = factor(c(rep("Low", 9), rep("Mid", 9), rep("High", 8))))
# We'll collapse the alphabet down to three factors
tdf$cat2 <- factor(tdf$cat2, levels(tdf$cat2)[c(2,3,1)])

# Now plot it as arrows running from the first encoding to the second
ggplot2::ggplot(tdf) + 
  geom_segment(data=tdf, aes(x=.05, xend = .45, y = cat1, yend = cat2), arrow = arrow()) + 
  geom_text(aes(x=0, y=cat1, label=cat1)) + 
  geom_text(aes(x=.5, y=cat2, label=cat2))+ 
  facet_wrap(~cat2, nrow = 3, scales = "free_y") + 
  theme_classic()+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line = element_blank(),
        strip.background = element_blank(),
        strip.text.y = element_blank()) +
  ggtitle("Variable Recodings")

enter image description here

With lots of variables this might be easier on the reader's eyes.

like image 29
gfgm Avatar answered Nov 03 '22 18:11

gfgm


If HTML works for you instead of latex, then you might find many options with the library tableHTML

here is an example of something you can do with it:

library(tableHTML)

connections <- mcr_col %>% 
  count(cyl_3col, cyl_is_red) 


groups <- connections %>% 
  group_by(cyl_is_red) %>% 
  summarise(cnt = length(cyl_3col))


tableHTML(connections %>% 
            select(-n, -cyl_is_red), 
          rownames = FALSE,
          row_groups = list(groups$cnt, groups$cyl_is_red))
like image 43
DS_UNI Avatar answered Nov 03 '22 17:11

DS_UNI