I have two data frames, one for store and one for sales:
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"))
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))
store
#StoreID StoreName
# 1 McDonalds
# 2 A&W
# 3 Burger King
# 4 Wendy's
sales
#StoreID ItemID SalesQty
# 1 2 10
# 2 2 20
# 1 3 30
# 1 4 40
# 2 4 50
# 2 5 60
I want to merge them, so that I can see StoreName for each sales transaction:
merged <- merge(sales, store, by = "StoreID")
merged
#StoreID ItemID SalesQty StoreName
# 1 2 10 McDonalds
# 1 3 30 McDonalds
# 1 4 40 McDonalds
# 2 2 20 A&W
# 2 4 50 A&W
# 2 5 60 A&W
Now I want to know for each StoreName in the merged data frame, how many different items were sold:
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))
#A&W Burger King McDonalds Wendy's
# 3 NA 3 NA
My question is, why does the tapply result show "Burger King" and "Wendy's" even though they aren't in the merged data frame?
This is because store$StoreName
is a factor
. When creating the store data frame, setting the argument stringsAsFactor
to FALSE
will ensure those store names with no matching element in sales
are dropped during the merge
.
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"), stringsAsFactors = FALSE)
merged <- merge(sales, store, by = "StoreID")
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))
#A&W McDonalds
# 3 3
You can try this too:
merged$StoreName <- factor(merged$StoreName)
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))
# A&W McDonalds
# 3 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With