I've following two data frame df_sales and df_supply.
I want to merge the sale to supply in such a manner that my df_sales table have DATE_SUPPLY and QNT_SUPPLY from df_supply on below conditions
*Condition: DATE_SUPPLY should be recent DATE_SUPPLY of corresponding "ITEM" for corresponding "STORE", i.e, DATE_SALE <- max(df_supply[df_supply$DATE_SUPPLY <= df_sales$DATE_SALE & df_supply$STORE == df_sales$STORE & df_supply$ITEM == df_sales$ITEM,]$DATE_SUPPLY)*
It can be possible using row apply function or simply by writing loop. But I have huge dataset so don't want looping.
df_sales <- data.frame("STORE"=c(1001,1001,1001,1001,1001,1002,1002,1002,1002,1002),"ITEM"=c(13048, 13057, 13082, 13048, 13057, 13145, 13166, 13229, 13057, 13048),"DATE_SALE"=as.Date(c("1/1/2014","1/1/2014","1/2/2014","1/2/2014","1/2/2014","1/2/2014","1/3/2014","1/3/2014","1/3/2014","1/4/2014"),"%m/%d/%Y"),"QNT_SALE"=c(1,1,1,1,1,1,1,1,1,1))
df_sales
STORE ITEM DATE_SALE QNT_SALE
1 1001 13048 2014-01-01 1
2 1001 13057 2014-01-01 1
3 1001 13082 2014-01-02 1
4 1001 13048 2014-01-02 1
5 1001 13057 2014-01-02 1
6 1002 13145 2014-01-02 1
7 1002 13166 2014-01-03 1
8 1002 13229 2014-01-03 1
9 1002 13057 2014-01-03 1
10 1002 13048 2014-01-04 1
df_supply <- data.frame("STORE"=c(1001,1002,1001,1001,1002,1002,1002,1002,1002,1002),"ITEM"=c(13048,13229,13057,13082,13145,13166,13229,13057,13048,13048),"DATE_SUPPLY"=as.Date(c("1/31/2013","1/31/2013","1/31/2013","1/1/2014","1/2/2014","1/2/2014","1/2/2014","1/2/2014","1/3/2014","2/1/2014"),"%m/%d/%Y"),"QNT_SUPPLY"=c(2,1,2,1,1,1,2,3,1,2))
df_supply
STORE ITEM DATE_SUPPLY CUM_QNT_SUPPLY
1 1001 13048 2013-01-31 2
2 1002 13229 2013-01-31 1
3 1001 13057 2013-01-31 2
4 1001 13082 2014-01-01 1
5 1002 13145 2014-01-02 1
6 1002 13166 2014-01-02 1
7 1002 13229 2014-01-02 2
8 1002 13057 2014-01-02 3
9 1002 13048 2014-01-03 1
10 1002 13048 2014-02-01 2
Output Required:
Sales Vs Supply
STORE ITEM DATE_SALE QNT_SALE DATE_SUPPLY QNT_SUPPLY
1 1001 13048 2014-01-01 1 2013-01-31 2
2 1001 13057 2014-01-01 1 2013-01-31 2
3 1001 13082 2014-01-02 1 2014-01-01 1
4 1001 13048 2014-01-02 1 2013-01-31 2
5 1001 13057 2014-01-02 1 2013-01-31 2
6 1002 13145 2014-01-03 1 2014-01-02 1
7 1002 13166 2014-01-03 1 2014-01-02 1
8 1002 13229 2014-01-03 1 2014-01-02 2
9 1002 13057 2014-01-03 1 2014-01-02 3
10 1002 13048 2014-01-04 1 2014-01-03 1
Using rolling joins from data.table:
require(data.table)
setkey(setDT(df_supply), STORE, ITEM, DATE_SUPPLY)
idx = df_supply[df_sales, roll=Inf, which=TRUE]
cbind(df_sales, df_supply[idx, 3:4])
# STORE ITEM DATE_SALE QNT_SALE DATE_SUPPLY QNT_SUPPLY
# 1 1001 13048 2014-01-01 1 2013-01-31 2
# 2 1001 13057 2014-01-01 1 2013-01-31 2
# 3 1001 13082 2014-01-02 1 2014-01-01 1
# 4 1001 13048 2014-01-02 1 2013-01-31 2
# 5 1001 13057 2014-01-02 1 2013-01-31 2
# 6 1002 13145 2014-01-02 1 2014-01-02 1
# 7 1002 13166 2014-01-03 1 2014-01-02 1
# 8 1002 13229 2014-01-03 1 2014-01-02 2
# 9 1002 13057 2014-01-03 1 2014-01-02 3
# 10 1002 13048 2014-01-04 1 2014-01-03 1
cbind returns an entirely new object. If instead you'd like to add the new columns by reference to df_sales use := instead. There are numerous examples of using it here on SO and also explained under the new HTML vignettes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With