My data is in a data.frame format like this sample data:
data <-
structure(list(Article = structure(c(1L, 1L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("10004", "10006", "10007"), class = "factor"),
Demand = c(26L, 780L, 2L, 181L, 228L, 214L, 219L, 291L, 104L,
72L, 155L, 237L, 182L, 148L, 52L, 227L, 2L, 355L, 2L, 432L,
1L, 156L), Week = c("2013-W01", "2013-W01", "2013-W01", "2013-W01",
"2013-W01", "2013-W02", "2013-W02", "2013-W02", "2013-W02",
"2013-W02", "2013-W03", "2013-W03", "2013-W03", "2013-W03",
"2013-W03", "2013-W04", "2013-W04", "2013-W04", "2013-W04",
"2013-W04", "2013-W04", "2013-W04")), .Names = c("Article",
"Demand", "Week"), class = "data.frame", row.names = c(NA, -22L))
I would like to summarize the demand column by week and article. To do this, I use:
library(dplyr)
WeekSums <-
data %>%
group_by(Article, Week) %>%
summarize(
WeekDemand = sum(Demand)
)
But because some articles were not sold in certain weeks, the number of rows per article differs (only weeks with sales are shown in the WeekSums dataframe). How could I adjust my data so that each article has the same number of rows (one for each week), including weeks with 0 demand?
The output should then look like this:
Article Week WeekDemand
1 10004 2013-W01 1215
2 10004 2013-W02 900
3 10004 2013-W03 774
4 10004 2013-W04 1170
5 10006 2013-W01 0
6 10006 2013-W02 0
7 10006 2013-W03 0
8 10006 2013-W04 5
9 10007 2013-W01 2
10 10007 2013-W02 0
11 10007 2013-W03 0
12 10007 2013-W04 0
I tried
WeekSums %>%
group_by(Article) %>%
if(n()< 4) rep(rbind(c(Article,NA,NA)), 4 - n() )
but this doesn’t work. In my original approach, I resolved this problem by merging a dataframe of week numbers 1-4 with my rawdata file for each article. That way, I got 4 weeks (rows) per article, but the implementation with a for loop is very inefficient and so I’m trying to do the same with dplyr (or any other more efficient package/function). Any suggestions would be much appreciated!
Use add_row() from tibble or tidyverse The packages tibble or tidyverse provides a function add_row() to add a row to DataFrame in R. This is a convenient way to add one or more rows of data to an existing data frame.
To add row to R Data Frame, append the list or vector representing the row, to the end of the data frame. nrow(df) returns the number of rows in data frame. nrow(df) + 1 means the next row after the end of data frame. Assign the new row to this row position in the data frame.
rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping. The exception is summarise() , which return a grouped_df.
Arrange Rows In R For example, let's sort by teamID. Run arrange (teams, teamID). If you want them to be arranged in descending order, you need to use the desc ( ) function. As an example, if you want to sort by year in descending order, run arrange (teams, desc(yearID)).
Without dplyr it can be done like this:
as.data.frame(xtabs(Demand ~ Week + Article, data))
giving:
Week Article Freq
1 2013-W01 10004 1215
2 2013-W02 10004 900
3 2013-W03 10004 774
4 2013-W04 10004 1170
5 2013-W01 10006 0
6 2013-W02 10006 0
7 2013-W03 10006 0
8 2013-W04 10006 5
9 2013-W01 10007 2
10 2013-W02 10007 0
11 2013-W03 10007 0
12 2013-W04 10007 0
and this can be rewritten as a magrittr or dplyr pipeline like this:
data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()
The as.data.frame()
at the end could be omitted if a wide form solution were desired.
Since dplyr
is under active development, I thought I would post an update that also incorporates tidyr
:
library(dplyr)
library(tidyr)
data %>%
expand(Article, Week) %>%
left_join(data) %>%
group_by(Article, Week) %>%
summarise(WeekDemand = sum(Demand, na.rm=TRUE))
Which produces:
Article Week WeekDemand
1 10004 2013-W01 1215
2 10004 2013-W02 900
3 10004 2013-W03 774
4 10004 2013-W04 1170
5 10006 2013-W01 0
6 10006 2013-W02 0
7 10006 2013-W03 0
8 10006 2013-W04 5
9 10007 2013-W01 2
10 10007 2013-W02 0
11 10007 2013-W03 0
12 10007 2013-W04 0
Using tidyr >= 0.3.1 this can now be written as:
data %>%
complete(Article, Week) %>%
group_by(Article, Week) %>%
summarise(Demand = sum(Demand, na.rm = TRUE))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With