Checking if one data frame is a reorder of another data frame [duplicate]

Question

I have two data frames that were generated in two different occasions, but I suspect they are equal. Both have the same number of row and columns, and visually they seem to be the same, except for how the rows are ordered.

Neither has an ID column by which I could reorder, the best I can do is reorder both by a process_number variable, which is the closest I can get to a unique column. However, even after that reorder identical yields FALSE and all.equal gives me this (summarized):

 [1] "Component 2: 32 string mismatches"
[16] "Component 18: 'is.NA' value mismatch: 183357 in current 183357 in target"
[23] "Component 27: Mean relative difference: 0.4688722"
[24] "Component 28: Mean relative difference: 0.0004968944"
[26] "Component 30: Attributes: < Component 2: 365 string mismatches >"
[28] "Component 31: 'current' is not a factor"

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

The best option I've found for these cases is to use the "compare" package:

library(compare)
compare(df1, df2, allowAll = TRUE)

The allowAll argument tries different transformations (for example, reordering rows, reordering columns, changing column types from factors to characters, and so on) and then gives you a summary of whether after different transformations, the two inputs are the same or not. If they are the same after transformations have been applied, it tells you which transformations were required to make them the same.

Richie Cotton · Answer

Your method is correct.

all.equal is telling you that your data frames are not reorderings of each other.

For more details, try examining

mismatch_in_col_2 <- data1[, 2] != data2[, 2]
cbind(data1[mismatch_in_col_2, 2], data2[mismatch_in_col_2, 2])

(Repeat for the other columns with differences.)

You mentioned that process_number "is the closest I can get to a unique column". Perhaps some of the difference relates to ties being ordered in a different way. Is there a second column you can sort on?

Checking if one data frame is a reorder of another data frame [duplicate]

Tags:

sorting

dataframe

r

compare

Waldir Leoncio

2 Answers

A5C1D2H2I1M1N2O1R2T1

Richie Cotton

Recent Activity

Donate For Us

Checking if one data frame is a reorder of another data frame [duplicate]

Tags:

sorting

dataframe

r

compare

Waldir Leoncio

2 Answers

A5C1D2H2I1M1N2O1R2T1

Richie Cotton

Related questions

Recent Activity

Donate For Us