I am trying to roll up a bunch of rows for one day into a single row. I would like it in dplyr if possible. I know that my code is far from correct, but this was how far I got:
data %>%
group_by(DAY) %>%
summarise_each(funs(Sum = n()), SEX, GROUP, TOTAL)
Original:
DAY SEX GROUP TOTAL
7/1/14 FEMALE A 1
7/1/14 FEMALE B 1
7/1/14 FEMALE B 1
7/1/14 FEMALE A 1
7/1/14 MALE A 1
7/1/14 MALE B 2
New:
DAY FEMALE MALE GROUP_A GROUP_B TOTAL
7/1/14 4 2 3 3 7
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
To change the row order in an R data frame, we can use single square brackets and provide the row order at first place.
Another way with data.table
, tested on a data.frame
with more than one day.
require(data.table)
setDT(data)[, as.list(c(table(SEX), table(GROUP), TOTAL=sum(TOTAL))), by=DAY]
# DAY FEMALE MALE A B TOTAL
#1: 7/1/14 3 0 1 2 3
#2: 8/1/14 1 2 2 1 4
EDIT: another, less manual, option (you don't need to know which variables are factors and which are numeric), thanks to some help from @jangorecki and @DavidArenburg
wh_num <- sapply(data, is.numeric)[-1]
wh_fact <-sapply(data, is.factor)[-1]
setDT(data)[, as.list(c(lapply(.SD[, wh_fact, with = FALSE], table),
lapply(.SD[, wh_num, with = FALSE], sum),
recursive = TRUE)), by = DAY]
# DAY SEX.FEMALE SEX.MALE GROUP.A GROUP.B TOTAL
#1: 7/1/14 3 0 1 2 3
#2: 8/1/14 1 2 2 1 4
data
data <- structure(list(DAY = c("7/1/14", "7/1/14", "7/1/14", "8/1/14",
"8/1/14", "8/1/14"), SEX = structure(c(1L, 1L, 1L, 1L, 2L, 2L
), .Label = c("FEMALE", "MALE"), class = "factor"), GROUP = structure(c(1L,
2L, 2L, 1L, 1L, 2L), .Label = c("A", "B"), class = "factor"),
TOTAL = c(1L, 1L, 1L, 1L, 1L, 2L)), .Names = c("DAY", "SEX",
"GROUP", "TOTAL"), row.names = c(NA, -6L), class = "data.frame")
It may seem a little arcane, but here is a short incantation
dat %>% group_by(DAY) %>%
summarise_each(funs(ifelse(is.numeric(.), sum(.), list(table(.))))) -> res
data.frame(DAY=res$DAY, t(unlist(res[, 2:ncol(res)])))
# DAY SEX.FEMALE SEX.MALE GROUP.A GROUP.B TOTAL
# 1 7/1/14 4 2 3 3 7
Here, you simply summarise each column as a table if it's not numeric, or sum it if it is (for the total column). This needs to be returned as a list since summarise_each
expects a single value. Then, the result is expanded to a regular data.frame
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With