I am really new to R and stackoverflow; apologies in advance for issues with my question.
I have two data frames
data.frame 1:
Product.ID Description Wholesale.Price
Prod1 Desc1 1.45
Prod Desc2 1.27
Prod3 Desc 3.62
Prod4 Desc4 2.15
Prod5 Desc5 2.87
Prod12 Desc6 2.53
Prod7 Desc7 2.20
Prod8 Desc8 2.60
Prod9 Desc9 3.68
data.frame 2:
Product.ID Description Wholesale.Price
Prod1 Desc1 1.45
Prod2 Desc2 1.27
Prod3 Desc3 3.62
Prod4 Desc4 1.57
Prod5 Desc5 2.87
Prod6 Desc6 2.53
Prod7 Desc7 2.20
Prod8 Desc8 3.21
Prod9 Desc9 1.81
I see that I can use merge(list_1, list_2) to print where all 3 columns of the two data frames match (which is very cool).
I am trying to find a means of printing out where there are discrepancies between Description and Wholesale.price between the two data frames based on Product.ID. I am not even sure how to visualize the discrepancies in a meaningful way.
Any assistance is most appreciated.
Here is a quick two liner. First read in the data from @bgoldst:
df1 <- data.frame(Product.ID=c('Prod1','Prod','Prod3','Prod4','Prod5','Prod12','Prod7','Prod8','Prod9'), Description=c('Desc1','Desc2','Desc','Desc4','Desc5','Desc6','Desc7','Desc8','Desc9'), Wholesale.Price=c(1.45,1.27,3.62,2.15,2.87,2.53,2.20,2.60,3.68), stringsAsFactors=F );
df2 <- data.frame(Product.ID=c('Prod1','Prod2','Prod3','Prod4','Prod5','Prod6','Prod7','Prod8','Prod9'), Description=c('Desc1','Desc2','Desc3','Desc4','Desc5','Desc6','Desc7','Desc8','Desc9'), Wholesale.Price=c(1.45,1.27,3.62,1.57,2.87,2.53,2.20,3.21,1.81), stringsAsFactors=F );
Now we want to merge it, but keep all the columns:
x <- merge(df1, df2, by = "Product.ID")
Now print out the columns with those that have a mismatch in price or description:
x[x$Description.x != x$Description.y | x$Wholesale.Price.x != x$Wholesale.Price.y, ]
Product.ID Description.x Wholesale.Price.x Description.y Wholesale.Price.y
2 Prod3 Desc 3.62 Desc3 3.62
3 Prod4 Desc4 2.15 Desc4 1.57
6 Prod8 Desc8 2.60 Desc8 3.21
7 Prod9 Desc9 3.68 Desc9 1.81
Let's rename the columns you want to compare:
names(list_1)[3] = "Price1"
names(list_2)[3] = "Price2"
Now we can merge and keep both price columns.
list_both = merge(list_1, list_2)
# calculate differences
list_both$difference = list_both$Price1 - list_both$Price2
# look at the top of the data
head(list_both)
# print out those with a difference
list_both[list_both$difference != 0, ]
For visualization, I'll leave you to explore a little on your own from here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With