Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add rows to grouped data with dplyr?

Tags:

dataframe

r

dplyr

My data is in a data.frame format like this sample data:

data <- 
structure(list(Article = structure(c(1L, 1L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("10004", "10006", "10007"), class = "factor"), 
Demand = c(26L, 780L, 2L, 181L, 228L, 214L, 219L, 291L, 104L, 
72L, 155L, 237L, 182L, 148L, 52L, 227L, 2L, 355L, 2L, 432L, 
1L, 156L), Week = c("2013-W01", "2013-W01", "2013-W01", "2013-W01", 
"2013-W01", "2013-W02", "2013-W02", "2013-W02", "2013-W02", 
"2013-W02", "2013-W03", "2013-W03", "2013-W03", "2013-W03", 
"2013-W03", "2013-W04", "2013-W04", "2013-W04", "2013-W04", 
"2013-W04", "2013-W04", "2013-W04")), .Names = c("Article", 
"Demand", "Week"), class = "data.frame", row.names = c(NA, -22L))

I would like to summarize the demand column by week and article. To do this, I use:

library(dplyr)
WeekSums <- 
  data %>%
   group_by(Article, Week) %>%
   summarize(
    WeekDemand = sum(Demand)
   )

But because some articles were not sold in certain weeks, the number of rows per article differs (only weeks with sales are shown in the WeekSums dataframe). How could I adjust my data so that each article has the same number of rows (one for each week), including weeks with 0 demand?

The output should then look like this:

  Article     Week WeekDemand
1   10004 2013-W01       1215
2   10004 2013-W02        900
3   10004 2013-W03        774
4   10004 2013-W04       1170
5   10006 2013-W01        0
6   10006 2013-W02        0
7   10006 2013-W03        0
8   10006 2013-W04         5
9   10007 2013-W01         2
10   10007 2013-W02        0
11   10007 2013-W03        0
12   10007 2013-W04        0

I tried

WeekSums %>%
  group_by(Article) %>%
  if(n()< 4) rep(rbind(c(Article,NA,NA)), 4 - n() )

but this doesn’t work. In my original approach, I resolved this problem by merging a dataframe of week numbers 1-4 with my rawdata file for each article. That way, I got 4 weeks (rows) per article, but the implementation with a for loop is very inefficient and so I’m trying to do the same with dplyr (or any other more efficient package/function). Any suggestions would be much appreciated!

like image 767
talat Avatar asked May 04 '14 00:05

talat


People also ask

How do you add rows in Tibble R?

Use add_row() from tibble or tidyverse The packages tibble or tidyverse provides a function add_row() to add a row to DataFrame in R. This is a convenient way to add one or more rows of data to an existing data frame.

How do I add a row to a data table in R?

To add row to R Data Frame, append the list or vector representing the row, to the end of the data frame. nrow(df) returns the number of rows in data frame. nrow(df) + 1 means the next row after the end of data frame. Assign the new row to this row position in the data frame.

What does rowwise () do in R?

rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping. The exception is summarise() , which return a grouped_df.

How do I group different rows in R?

Arrange Rows In R For example, let's sort by teamID. Run arrange (teams, teamID). If you want them to be arranged in descending order, you need to use the desc ( ) function. As an example, if you want to sort by year in descending order, run arrange (teams, desc(yearID)).


2 Answers

Without dplyr it can be done like this:

as.data.frame(xtabs(Demand ~ Week + Article, data))

giving:

       Week Article Freq
1  2013-W01   10004 1215
2  2013-W02   10004  900
3  2013-W03   10004  774
4  2013-W04   10004 1170
5  2013-W01   10006    0
6  2013-W02   10006    0
7  2013-W03   10006    0
8  2013-W04   10006    5
9  2013-W01   10007    2
10 2013-W02   10007    0
11 2013-W03   10007    0
12 2013-W04   10007    0

and this can be rewritten as a magrittr or dplyr pipeline like this:

data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()

The as.data.frame() at the end could be omitted if a wide form solution were desired.

like image 61
G. Grothendieck Avatar answered Sep 29 '22 13:09

G. Grothendieck


Since dplyr is under active development, I thought I would post an update that also incorporates tidyr:

library(dplyr)
library(tidyr)

data %>%
  expand(Article, Week) %>%
  left_join(data) %>%
  group_by(Article, Week) %>%
  summarise(WeekDemand = sum(Demand, na.rm=TRUE))

Which produces:

   Article     Week WeekDemand
1    10004 2013-W01       1215
2    10004 2013-W02        900
3    10004 2013-W03        774
4    10004 2013-W04       1170
5    10006 2013-W01          0
6    10006 2013-W02          0
7    10006 2013-W03          0
8    10006 2013-W04          5
9    10007 2013-W01          2
10   10007 2013-W02          0
11   10007 2013-W03          0
12   10007 2013-W04          0

Using tidyr >= 0.3.1 this can now be written as:

data %>% 
  complete(Article, Week) %>%  
  group_by(Article, Week) %>% 
  summarise(Demand = sum(Demand, na.rm = TRUE))
like image 33
rrs Avatar answered Sep 29 '22 14:09

rrs