Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if two data frames are equal [duplicate]

Say I have large datasets in R and I just want to know whether two of them they are the same. I use this often when I'm experimenting different algorithms to achieve the same result. For example, say we have the following datasets:

df1 <- data.frame(num = 1:5, let = letters[1:5]) df2 <- df1 df3 <- data.frame(num = c(1:5, NA), let = letters[1:6]) df4 <- df3 

So this is what I do to compare them:

table(x == y, useNA = 'ifany') 

Which works great when the datasets have no NAs:

> table(df1 == df2, useNA = 'ifany') TRUE    10  

But not so much when they have NAs:

> table(df3 == df4, useNA = 'ifany') TRUE <NA>    11    1  

In the example, it's easy to dismiss the NA as not a problem since we know that both dataframes are equal. The problem is that NA == <anything> yields NA, so whenever one of the datasets has an NA, it doesn't matter what the other one has on that same position, the result is always going to be NA.

So using table() to compare datasets doesn't seem ideal to me. How can I better check if two data frames are identical?

P.S.: Note this is not a duplicate of R - comparing several datasets, Comparing 2 datasets in R or Compare datasets in R

like image 795
Waldir Leoncio Avatar asked Oct 01 '13 14:10

Waldir Leoncio


People also ask

How do you check if two data frames are exactly the same?

DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do I compare two data frames?

Overview. The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.

How do I tell if two DataFrames are identical in R?

Example 2: Check Whether Two Data Frames are Equal Using all. equal() Function. The all. equal function also returns the logical value TRUE.


2 Answers

Look up all.equal. It has some riders but it might work for you.

all.equal(df3,df4) # [1] TRUE all.equal(df2,df1) # [1] TRUE 
like image 146
TheComeOnMan Avatar answered Sep 19 '22 23:09

TheComeOnMan


As Metrics pointed out, one could also use identical() to compare the datasets. The difference between this approach and that of Codoremifa is that identical() will just yield TRUE of FALSE, depending whether the objects being compared are identical or not, whereas all.equal() will either return TRUE or hints about the differences between the objects. For instance, consider the following:

> identical(df1, df3) [1] FALSE  > all.equal(df1, df3) [1] "Attributes: < Component 2: Numeric: lengths (5, 6) differ >"                                 [2] "Component 1: Numeric: lengths (5, 6) differ"                                                 [3] "Component 2: Lengths: 5, 6"                                                                  [4] "Component 2: Attributes: < Component 2: Lengths (5, 6) differ (string compare on first 5) >" [5] "Component 2: Lengths (5, 6) differ (string compare on first 5)"    

Moreover, from what I've tested identical() seems to run much faster than all.equal().

like image 21
Waldir Leoncio Avatar answered Sep 16 '22 23:09

Waldir Leoncio