Add rows to grouped data with dplyr?

Tags:

My data is in a data.frame format like this sample data:

data <- 
structure(list(Article = structure(c(1L, 1L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("10004", "10006", "10007"), class = "factor"), 
Demand = c(26L, 780L, 2L, 181L, 228L, 214L, 219L, 291L, 104L, 
72L, 155L, 237L, 182L, 148L, 52L, 227L, 2L, 355L, 2L, 432L, 
1L, 156L), Week = c("2013-W01", "2013-W01", "2013-W01", "2013-W01", 
"2013-W01", "2013-W02", "2013-W02", "2013-W02", "2013-W02", 
"2013-W02", "2013-W03", "2013-W03", "2013-W03", "2013-W03", 
"2013-W03", "2013-W04", "2013-W04", "2013-W04", "2013-W04", 
"2013-W04", "2013-W04", "2013-W04")), .Names = c("Article", 
"Demand", "Week"), class = "data.frame", row.names = c(NA, -22L))

I would like to summarize the demand column by week and article. To do this, I use:

library(dplyr)
WeekSums <- 
  data %>%
   group_by(Article, Week) %>%
   summarize(
    WeekDemand = sum(Demand)
   )

But because some articles were not sold in certain weeks, the number of rows per article differs (only weeks with sales are shown in the WeekSums dataframe). How could I adjust my data so that each article has the same number of rows (one for each week), including weeks with 0 demand?

The output should then look like this:

  Article     Week WeekDemand
1   10004 2013-W01       1215
2   10004 2013-W02        900
3   10004 2013-W03        774
4   10004 2013-W04       1170
5   10006 2013-W01        0
6   10006 2013-W02        0
7   10006 2013-W03        0
8   10006 2013-W04         5
9   10007 2013-W01         2
10   10007 2013-W02        0
11   10007 2013-W03        0
12   10007 2013-W04        0

I tried

WeekSums %>%
  group_by(Article) %>%
  if(n()< 4) rep(rbind(c(Article,NA,NA)), 4 - n() )

but this doesn’t work. In my original approach, I resolved this problem by merging a dataframe of week numbers 1-4 with my rawdata file for each article. That way, I got 4 weeks (rows) per article, but the implementation with a for loop is very inefficient and so I’m trying to do the same with dplyr (or any other more efficient package/function). Any suggestions would be much appreciated!

767

asked May 04 '14 00:05

talat

2 Answers

Without dplyr it can be done like this:

as.data.frame(xtabs(Demand ~ Week + Article, data))

giving:

       Week Article Freq
1  2013-W01   10004 1215
2  2013-W02   10004  900
3  2013-W03   10004  774
4  2013-W04   10004 1170
5  2013-W01   10006    0
6  2013-W02   10006    0
7  2013-W03   10006    0
8  2013-W04   10006    5
9  2013-W01   10007    2
10 2013-W02   10007    0
11 2013-W03   10007    0
12 2013-W04   10007    0

and this can be rewritten as a magrittr or dplyr pipeline like this:

data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()

The as.data.frame() at the end could be omitted if a wide form solution were desired.

answered Sep 29 '22 13:09

G. Grothendieck

Since dplyr is under active development, I thought I would post an update that also incorporates tidyr:

library(dplyr)
library(tidyr)

data %>%
  expand(Article, Week) %>%
  left_join(data) %>%
  group_by(Article, Week) %>%
  summarise(WeekDemand = sum(Demand, na.rm=TRUE))

Which produces:

   Article     Week WeekDemand
1    10004 2013-W01       1215
2    10004 2013-W02        900
3    10004 2013-W03        774
4    10004 2013-W04       1170
5    10006 2013-W01          0
6    10006 2013-W02          0
7    10006 2013-W03          0
8    10006 2013-W04          5
9    10007 2013-W01          2
10   10007 2013-W02          0
11   10007 2013-W03          0
12   10007 2013-W04          0

Using tidyr >= 0.3.1 this can now be written as:

data %>% 
  complete(Article, Week) %>%  
  group_by(Article, Week) %>% 
  summarise(Demand = sum(Demand, na.rm = TRUE))

answered Sep 29 '22 14:09

rrs

Related questions
                            
                                Create a partial dashed line in ggplot2
                            
                                How to check a data.frame for any non-finite
                            
                                Importing files from PostgreSQL to R
                            
                                ggplot separate legend and plot
                            
                                How to plot a multicolumn CSV file?
                            
                                R: how to expand a row containing a "list" to several rows...one for each list member?
                            
                                How to add a diagonal line to a plot?
                            
                                RStudio Shiny ERROR: there is no package called "shinydashboard"
                            
                                TwitteR setup_twitter_oauth() failing
                            
                                R Markdown: Putting an image in the top right hand corner of HTML and moving title down
                            
                                R get last element from str_split [duplicate]
                            
                                how do I get the difference between two R named lists?
                            
                                determining name of object loaded in R
                            
                                Draw a chronological timeline with ggplot2
                            
                                r: for loop operation with nested indices runs super slow
                            
                                Circular plot in ggplot2 with line segments connected in r
                            
                                Stacking multiple plots, vertically with the same x axis but different Y axes in R
                            
                                How to load packages automatically when opening a project in RStudio
                            
                                Type-safety of the R language [closed]
                            
                                How to extract variable names from a netCDF file in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add rows to grouped data with dplyr?

Tags:

dataframe

r

dplyr

talat

People also ask

2 Answers

G. Grothendieck

rrs

Recent Activity

Donate For Us