Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing data frames in R

Tags:

r

compare

I am really new to R and stackoverflow; apologies in advance for issues with my question.

I have two data frames

data.frame 1:

Product.ID Description Wholesale.Price
Prod1      Desc1       1.45
Prod       Desc2       1.27
Prod3      Desc        3.62
Prod4      Desc4       2.15
Prod5      Desc5       2.87
Prod12     Desc6       2.53
Prod7      Desc7       2.20
Prod8      Desc8       2.60
Prod9      Desc9       3.68

data.frame 2:

Product.ID Description Wholesale.Price
Prod1      Desc1       1.45
Prod2      Desc2       1.27
Prod3      Desc3       3.62
Prod4      Desc4       1.57
Prod5      Desc5       2.87
Prod6      Desc6       2.53
Prod7      Desc7       2.20
Prod8      Desc8       3.21
Prod9      Desc9       1.81

I see that I can use merge(list_1, list_2) to print where all 3 columns of the two data frames match (which is very cool).

I am trying to find a means of printing out where there are discrepancies between Description and Wholesale.price between the two data frames based on Product.ID. I am not even sure how to visualize the discrepancies in a meaningful way.

Any assistance is most appreciated.

like image 821
aguadamuz Avatar asked Apr 27 '26 05:04

aguadamuz


2 Answers

Here is a quick two liner. First read in the data from @bgoldst:

df1 <- data.frame(Product.ID=c('Prod1','Prod','Prod3','Prod4','Prod5','Prod12','Prod7','Prod8','Prod9'), Description=c('Desc1','Desc2','Desc','Desc4','Desc5','Desc6','Desc7','Desc8','Desc9'), Wholesale.Price=c(1.45,1.27,3.62,2.15,2.87,2.53,2.20,2.60,3.68), stringsAsFactors=F );
df2 <- data.frame(Product.ID=c('Prod1','Prod2','Prod3','Prod4','Prod5','Prod6','Prod7','Prod8','Prod9'), Description=c('Desc1','Desc2','Desc3','Desc4','Desc5','Desc6','Desc7','Desc8','Desc9'), Wholesale.Price=c(1.45,1.27,3.62,1.57,2.87,2.53,2.20,3.21,1.81), stringsAsFactors=F );

Now we want to merge it, but keep all the columns:

x <- merge(df1, df2, by = "Product.ID")

Now print out the columns with those that have a mismatch in price or description:

x[x$Description.x != x$Description.y | x$Wholesale.Price.x != x$Wholesale.Price.y, ]


  Product.ID Description.x Wholesale.Price.x Description.y Wholesale.Price.y
2      Prod3          Desc              3.62         Desc3              3.62
3      Prod4         Desc4              2.15         Desc4              1.57
6      Prod8         Desc8              2.60         Desc8              3.21
7      Prod9         Desc9              3.68         Desc9              1.81
like image 115
jeremycg Avatar answered Apr 29 '26 01:04

jeremycg


Let's rename the columns you want to compare:

names(list_1)[3] = "Price1"
names(list_2)[3] = "Price2"

Now we can merge and keep both price columns.

list_both = merge(list_1, list_2)

# calculate differences
list_both$difference = list_both$Price1 - list_both$Price2

# look at the top of the data
head(list_both)

# print out those with a difference
list_both[list_both$difference != 0, ]

For visualization, I'll leave you to explore a little on your own from here.

like image 40
Gregor Thomas Avatar answered Apr 29 '26 01:04

Gregor Thomas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!