Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R rolling up rows to a single row (continuous & factor variables)

Tags:

r

dplyr

I am trying to roll up a bunch of rows for one day into a single row. I would like it in dplyr if possible. I know that my code is far from correct, but this was how far I got:

data %>%
  group_by(DAY) %>%
  summarise_each(funs(Sum = n()), SEX, GROUP, TOTAL)

Original:

DAY SEX GROUP   TOTAL       
7/1/14  FEMALE  A   1       
7/1/14  FEMALE  B   1       
7/1/14  FEMALE  B   1       
7/1/14  FEMALE  A   1       
7/1/14  MALE    A   1       
7/1/14  MALE    B   2       

New:

DAY     FEMALE  MALE    GROUP_A GROUP_B TOTAL
7/1/14  4       2       3       3       7  
like image 386
yokota Avatar asked Jul 17 '15 07:07

yokota


People also ask

How do I grab certain rows in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

Can you subset rows in R?

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.

How do I change the order of rows in R?

To change the row order in an R data frame, we can use single square brackets and provide the row order at first place.


2 Answers

Another way with data.table, tested on a data.frame with more than one day.

require(data.table)
setDT(data)[, as.list(c(table(SEX), table(GROUP), TOTAL=sum(TOTAL))), by=DAY]

#      DAY FEMALE MALE A B TOTAL
#1: 7/1/14      3    0 1 2     3
#2: 8/1/14      1    2 2 1     4

EDIT: another, less manual, option (you don't need to know which variables are factors and which are numeric), thanks to some help from @jangorecki and @DavidArenburg

wh_num <- sapply(data, is.numeric)[-1]
wh_fact <-sapply(data, is.factor)[-1]
setDT(data)[, as.list(c(lapply(.SD[, wh_fact, with = FALSE], table), 
                        lapply(.SD[, wh_num, with = FALSE], sum), 
                        recursive = TRUE)), by = DAY]

#      DAY SEX.FEMALE SEX.MALE GROUP.A GROUP.B TOTAL
#1: 7/1/14          3        0       1       2     3
#2: 8/1/14          1        2       2       1     4

data

data <- structure(list(DAY = c("7/1/14", "7/1/14", "7/1/14", "8/1/14", 
"8/1/14", "8/1/14"), SEX = structure(c(1L, 1L, 1L, 1L, 2L, 2L
), .Label = c("FEMALE", "MALE"), class = "factor"), GROUP = structure(c(1L, 
2L, 2L, 1L, 1L, 2L), .Label = c("A", "B"), class = "factor"), 
    TOTAL = c(1L, 1L, 1L, 1L, 1L, 2L)), .Names = c("DAY", "SEX", 
"GROUP", "TOTAL"), row.names = c(NA, -6L), class = "data.frame")
like image 55
Cath Avatar answered Oct 07 '22 00:10

Cath


It may seem a little arcane, but here is a short incantation

dat %>% group_by(DAY) %>%
  summarise_each(funs(ifelse(is.numeric(.), sum(.), list(table(.))))) -> res

data.frame(DAY=res$DAY, t(unlist(res[, 2:ncol(res)])))
#      DAY SEX.FEMALE SEX.MALE GROUP.A GROUP.B TOTAL
# 1 7/1/14          4        2       3       3     7

Here, you simply summarise each column as a table if it's not numeric, or sum it if it is (for the total column). This needs to be returned as a list since summarise_each expects a single value. Then, the result is expanded to a regular data.frame.

like image 39
Rorschach Avatar answered Oct 07 '22 01:10

Rorschach