Is it feasible to test whether some dataframe is simply a sorted version of another dataframe? For example, if I have two dataframes a
and b
, is there some way to easily determine whether a
is simply a reordered version of b
(or vice versa)?
Here's a trivial example:
a <- data.frame(x1=1:10, x2=11:20, x3=1:2)
b <- a[order(a$x3, a$x1, decreasing=TRUE),]
The closest thing I can think of is all.equal
, but its output is not helpful (to me, at least):
> all.equal(a,b)
[1] "Attributes: < Component 2: Mean relative difference: 0.9545455 >"
[2] "Component 1: Mean relative difference: 0.9545455"
[3] "Component 2: Mean relative difference: 0.3387097"
[4] "Component 3: Mean relative difference: 0.6666667"
I imagine there is some obvious way to do this that is alluding me. I'm looking for a general solution that would scale well to many variables and many observations (thus the above example is simply for demonstration).
Also: Ideally, such a function would also identify whether a
is a subset of b
(or vice versa).
To check if the index of a DataFrame is sorted in ascending order use the is_monotonic_increasing property. Similarly, to check for descending order use the is_monotonic_decreasing property.
DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
In the pandas series constructor, there is a method called gt() which is used to apply the Greater Than condition between elements of two pandas series objects. The result of the gt() method is based on the comparison between elements of two series objects.
I would explore the "compare" package:
library(compare)
compare(a, b, allowAll=TRUE)
# TRUE
# sorted
Here, it shows that it had to sort the data before it found the data to be the same.
Here's a slightly more complicated example, with factors coerced to character, rows reordered, and columns reordered:
a <- data.frame(x1=1:10, x2=11:20, x3=1:2, x4 = letters[1:10])
b <- with(a, a[order(x3, x1, decreasing=TRUE), ])
b$x4 <- as.character(b$x4)
b <- b[c(4, 1, 3, 2)]
Here's the result of compare
:
compare(a, b, allowAll=TRUE)
# TRUE
# reordered columns
# [x4] coerced from <character> to <factor>
# sorted
You can sort both data frames along all columns and use identical
:
identical(a[do.call(order, a), ], b[do.call(order, b), ])
#[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With