Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inner join of two data frames still showing all values

Tags:

join

dataframe

r

I have two data frames, one for store and one for sales:

store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"))
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))  

store    
#StoreID   StoreName
#      1   McDonalds
#      2         A&W
#      3 Burger King
#      4     Wendy's 

sales  
#StoreID ItemID SalesQty  
#      1      2       10  
#      2      2       20  
#      1      3       30  
#      1      4       40  
#      2      4       50  
#      2      5       60  

I want to merge them, so that I can see StoreName for each sales transaction:

merged <- merge(sales, store, by = "StoreID")

merged
#StoreID ItemID SalesQty StoreName  
#      1      2       10 McDonalds  
#      1      3       30 McDonalds
#      1      4       40 McDonalds
#      2      2       20       A&W
#      2      4       50       A&W
#      2      5       60       A&W

Now I want to know for each StoreName in the merged data frame, how many different items were sold:

tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))

#A&W Burger King   McDonalds     Wendy's 
#  3          NA           3          NA 

My question is, why does the tapply result show "Burger King" and "Wendy's" even though they aren't in the merged data frame?

like image 680
javamoss Avatar asked Feb 09 '17 04:02

javamoss


2 Answers

This is because store$StoreName is a factor. When creating the store data frame, setting the argument stringsAsFactor to FALSE will ensure those store names with no matching element in sales are dropped during the merge.

sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))  
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"), stringsAsFactors = FALSE)
merged <- merge(sales, store, by = "StoreID")
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))

  #A&W McDonalds 
  #  3         3 
like image 171
tic-toc-choc Avatar answered Oct 08 '22 08:10

tic-toc-choc


You can try this too:

merged$StoreName <- factor(merged$StoreName)
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))

#  A&W McDonalds 
#    3         3
like image 1
Sandipan Dey Avatar answered Oct 08 '22 07:10

Sandipan Dey