I would like to make a cross tab in R using dplyr
. I have good reasons for not just using the base table()
command.
table(mtcars$cyl, mtcars$gear)
3 4 5
4 1 8 2
6 2 4 1
8 12 0 2
library(dplyr)
library(tidyr)
mtcars %>%
group_by(cyl, gear) %>%
tally() %>%
spread(gear, n, fill = 0)
Source: local data frame [3 x 4]
cyl 3 4 5
1 4 1 8 2
2 6 2 4 1
3 8 12 0 2
This is all well and good. But it seems to fall apart when there are missing values in the group_by()
variables.
mtcars %>%
mutate(
cyl = ifelse(cyl > 6, NA, cyl),
gear = ifelse(gear > 4, NA, gear)
) %>%
group_by(cyl, gear) %>%
tally()
Source: local data frame [8 x 3]
Groups: cyl
cyl gear n
1 4 3 1
2 4 4 8
3 4 NA 2
4 6 3 2
5 6 4 4
6 6 NA 1
7 NA 3 12
8 NA NA 2
# DITTO # %>%
spread(gear, n)
Error in if (any(names2(x) == "")) { :
missing value where TRUE/FALSE needed
I guess what I would like is for a NA
column like when you do table(..., useNA = "always")
. Any tips?
You can use the subcommand MISSING=INCLUDE which will include in the crosstab user defined missing values. If you have system missing values then only workaround is to temporarily recode them to a valid value. TEMPORARY. RECODE V1 V2 (SYSMIS=-99).
One option is to replace the NA
s with a label. This can be accomplished easily with mutate_each
:
mtcars %>%
mutate(
cyl = ifelse(cyl > 6, NA, cyl),
gear = ifelse(gear > 4, NA, gear)
) %>%
group_by(cyl, gear) %>%
tally() %>%
ungroup() %>%
mutate_each(funs(replace(., is.na(.), 'missing'))) %>%
spread(gear, n)
# cyl 3 4 missing
# 1 4 1 8 2
# 2 6 2 4 1
# 3 missing 12 NA 2
Agreed that the permanent solution to this should be a tidyr bug fix, but in the meantime, this can be worked around by dropping the dplyr tbl_df
format:
mtcars %>%
mutate(
cyl = ifelse(cyl > 6, NA, cyl),
gear = ifelse(gear > 4, NA, gear)
) %>%
group_by(cyl, gear) %>%
tally() %>%
data.frame() %>% ### <-- go from tbl_df to data.frame
spread(gear, n)
cyl 3 4 NA
1 4 1 8 2
2 6 2 4 1
3 NA 12 NA 2
The addition of the data.frame()
call allows your code to run, though it produces a column named NA
so this is probably best suited for exploratory analyses that print to the console.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With