Is it feasible to test whether some dataframe is simply a sorted version of another dataframe? For example, if I have two dataframes <code>a</code> and <code>b</code>, is there some way to easily determine whether <code>a</code> is simply a reordered version of <code>b</code> (or vice versa)? Here's a trivial example: <pre class="prettyprint"><code>a <- data.frame(x1=1:10, x2=11:20, x3=1:2) b <- a[order(a$x3, a$x1, decreasing=TRUE),] </code></pre> The closest thing I can think of is <code>all.equal</code>, but its output is not helpful (to me, at least): <pre class="prettyprint"><code>> all.equal(a,b) [1] "Attributes: < Component 2: Mean relative difference: 0.9545455 >" [2] "Component 1: Mean relative difference: 0.9545455" [3] "Component 2: Mean relative difference: 0.3387097" [4] "Component 3: Mean relative difference: 0.6666667" </code></pre> I imagine there is some obvious way to do this that is alluding me. I'm looking for a general solution that would scale well to many variables and many observations (thus the above example is simply for demonstration). Also: Ideally, such a function would also identify whether <code>a</code> is a subset of <code>b</code> (or vice versa).

I would explore the "compare" package: <pre class="prettyprint"><code>library(compare) compare(a, b, allowAll=TRUE) # TRUE # sorted </code></pre> Here, it shows that it had to sort the data before it found the data to be the same. Here's a slightly more complicated example, with factors coerced to character, rows reordered, and columns reordered: <pre class="prettyprint"><code>a <- data.frame(x1=1:10, x2=11:20, x3=1:2, x4 = letters[1:10]) b <- with(a, a[order(x3, x1, decreasing=TRUE), ]) b$x4 <- as.character(b$x4) b <- b[c(4, 1, 3, 2)] </code></pre> Here's the result of <code>compare</code>: <pre class="prettyprint"><code>compare(a, b, allowAll=TRUE) # TRUE # reordered columns # [x4] coerced from <character> to <factor> # sorted </code></pre>

You can sort both data frames along all columns and use <code>identical</code>: <pre class="prettyprint"><code>identical(a[do.call(order, a), ], b[do.call(order, b), ]) #[1] TRUE </code></pre>

Test whether a dataframe is a sorted version of another dataframe

Is it feasible to test whether some dataframe is simply a sorted version of another dataframe? For example, if I have two dataframes a and b, is there some way to easily determine whether a is simply a reordered version of b (or vice versa)?

Here's a trivial example:

a <- data.frame(x1=1:10, x2=11:20, x3=1:2)
b <- a[order(a$x3, a$x1, decreasing=TRUE),]

The closest thing I can think of is all.equal, but its output is not helpful (to me, at least):

> all.equal(a,b)
[1] "Attributes: < Component 2: Mean relative difference: 0.9545455 >"
[2] "Component 1: Mean relative difference: 0.9545455"                
[3] "Component 2: Mean relative difference: 0.3387097"                
[4] "Component 3: Mean relative difference: 0.6666667"

I imagine there is some obvious way to do this that is alluding me. I'm looking for a general solution that would scale well to many variables and many observations (thus the above example is simply for demonstration).

Also: Ideally, such a function would also identify whether a is a subset of b (or vice versa).

How do you check if a DataFrame is sorted?

To check if the index of a DataFrame is sorted in ascending order use the is_monotonic_increasing property. Similarly, to check for descending order use the is_monotonic_decreasing property.

How do you check if two data frames are exactly the same?

DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do you compare two Pandas Series?

In the pandas series constructor, there is a method called gt() which is used to apply the Greater Than condition between elements of two pandas series objects. The result of the gt() method is based on the comparison between elements of two series objects.

I would explore the "compare" package:

library(compare)
compare(a, b, allowAll=TRUE)
# TRUE
#   sorted

Here, it shows that it had to sort the data before it found the data to be the same.

Here's a slightly more complicated example, with factors coerced to character, rows reordered, and columns reordered:

a <- data.frame(x1=1:10, x2=11:20, x3=1:2, x4 = letters[1:10])
b <- with(a, a[order(x3, x1, decreasing=TRUE), ])
b$x4 <- as.character(b$x4)
b <- b[c(4, 1, 3, 2)]

Here's the result of compare:

compare(a, b, allowAll=TRUE)
# TRUE
#   reordered columns
#   [x4] coerced from <character> to <factor>
#   sorted

You can sort both data frames along all columns and use identical:

identical(a[do.call(order, a), ], b[do.call(order, b), ])
#[1] TRUE

Test whether a dataframe is a sorted version of another dataframe

Tags:

comparison

sorting

dataframe

r

Thomas

People also ask

2 Answers

A5C1D2H2I1M1N2O1R2T1

Sven Hohenstein

Recent Activity

Donate For Us

Test whether a dataframe is a sorted version of another dataframe

Tags:

comparison

sorting

dataframe

r

Thomas

People also ask

2 Answers

A5C1D2H2I1M1N2O1R2T1

Sven Hohenstein

Related questions

Recent Activity

Donate For Us