Here I have integers 1:7
for four different partitions, i.e., {1}, {2,3,4}, {5,6}, and {7} and those partitions are written in a list, i.e., list(1,c(2,3,4),c(5,6),7)
. I treat the partitions as sets, such that different permutation of elements within one partition should be recognized as the same one. For example, list(1,c(2,3,4),c(5,6),7)
and list(7,1,c(2,3,4),c(6,5))
are equivalent.
Note that, there is no repetition for elements in list, e.g., no list(c(1,2),c(2,1),c(1,2))
, since this problem is discussing exclusive partitions over the whole set.
I listed some of the different permutations into the list lst
as below
lst <- list(list(1,c(2,3,4),c(5,6),7),
list(c(2,3,4),1,7,c(5,6)),
list(1,c(2,3,4),7,c(6,5)),
list(7,1,c(3,2,4),c(5,6)))
and what I want to do is to verify all permutations are equivalent. If yes, then we get result TRUE
.
What I did so far is to sort the elements within each partition, and used setdiff()
with interset()
and union()
to judge it (see my code below)
s <- Map(function(v) Map(sort,v),lst)
equivalent <- length(setdiff(Reduce(union,s),Reduce(intersect,s),))==0
However, I guess this method would be slow whenever the partition size scales up. Is there any faster approach to make it? Appreciated in advance!
# should return `TRUE`
lst1 <- list(list(1,c(2,3,4),c(5,6)),
list(c(2,3,4),1,c(5,6)),
list(1,c(2,3,4),c(6,5)))
# should return `TRUE`
lst2 <- list(list(1:2, 3:4), list(3:4, 1:2))
# should return `FALSE`
lst3 <- list(list(1,c(2,3,4),c(5,6)), list(c(2,3,4),1,c(5,6)), list(1,c(2,3,5),c(6,4)))
For the two lists to be equal, each element of the first list should be equal to the second list's corresponding element. If the two lists have the same elements, but the sequence is not the same, they will not be considered equal or identical lists.
You can convert the list to a set. A set cannot have duplicates. So if all the elements in the original list are identical, the set will have just one element. if len(set(input_list)) == 1: # input_list has all identical elements.
Python sort() method and == operator to compare lists Further, the == operator is used to compare the list, element by element.
A: Select the list you want to check to the Find values in textbox, and select the list you want to check based on into the According to textbox. B: check Each row in the Based on section. C: check Same values in Find section.
The program will print 'Equal' if all elements are the same else it will print 'Not Equal'. This example uses a built-in Python set datatype. Set does not allow duplicate elements. It also requires all of your elements to be hashable. Using this property of set, we can check whether all the elements in a list are the same or not.
Method 1: Compare Two Lists Using Equal Sign Operator Method 2: Match Data by Using Row Difference Technique Method 3: Match Row Difference by Using IF Condition Method 4: Match Data Even If There is a Row Difference
The fastest way is to hash the list you want to check, and then check each item in there. This is why Joe Koberg's answer is fast: checking set intersection is very fast. When you don't have a lot of data though, making sets can be a waste of time.
A post about R
and any variant of fast isn't complete without a solution featuring rcpp.
To maximize efficiency, picking the correct data structure will be of utmost importance. Our data structure needs to store unique values and also have fast insert/access. This is exactly what std::unordered_set embodies. We only need to determine how we can uniquely identify each vector
of unordered integers
.
The FTA states that every number can be uniquely represented (up to the order of the factors) by the product of primes numbers.
Here is an example demonstrating how we can use the FTA to quickly decipher if two vectors are equivalent up to order (N.B. P
below is a list of primes numbers... (2, 3, 5, 7, 11, etc.)
:
Maps to Maps to product
vec1 = (1, 2, 7) -->> P[1], P[2], P[7] --->> 2, 3, 17 -->> 102
vec2 = (7, 3, 1) -->> P[7], P[3], P[1] --->> 17, 5, 2 -->> 170
vec3 = (2, 7, 1) -->> P[2], P[7], P[1] --->> 3, 17, 2 -->> 102
From this, we see that vec1
and vec3
correctly map to the same number, whereas vec2
gets mapped to a different value.
Since our actual vectors may contain up to a hundred integers less than 1000, applying the FTA will yield extremely large numbers. We can get around this by taking advantage of the product rule of logarithm:
logb(xy) = logb(x) + logb(y)
With this at our disposal, we will be able to tackle much larger numbers example (This starts deteriorating on extremely large examples).
First, we need a simple prime number generator (N.B. We are actually generating the log of each prime number).
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
void getNPrimes(std::vector<double> &logPrimes) {
const int n = logPrimes.size();
const int limit = static_cast<int>(2.0 * static_cast<double>(n) * std::log(n));
std::vector<bool> sieve(limit + 1, true);
int lastP = 3;
const int fsqr = std::sqrt(static_cast<double>(limit));
while (lastP <= fsqr) {
for (int j = lastP * lastP; j <= limit; j += 2 * lastP)
sieve[j] = false;
int ind = 2;
for (int k = lastP + 2; !sieve[k]; k += 2)
ind += 2;
lastP += ind;
}
logPrimes[0] = std::log(2.0);
for (int i = 3, j = 1; i <= limit && j < n; i += 2)
if (sieve[i])
logPrimes[j++] = std::log(static_cast<double>(i));
}
And here is the main implementation:
// [[Rcpp::export]]
bool f_Rcpp_Hash(List x) {
List tempLst = x[0];
const int n = tempLst.length();
int myMax = 0;
// Find the max so we know how many primes to generate
for (int i = 0; i < n; ++i) {
IntegerVector v = tempLst[i];
const int tempMax = *std::max_element(v.cbegin(), v.cend());
if (tempMax > myMax)
myMax = tempMax;
}
std::vector<double> logPrimes(myMax + 1, 0.0);
getNPrimes(logPrimes);
double sumMax = 0.0;
for (int i = 0; i < n; ++i) {
IntegerVector v = tempLst[i];
double mySum = 0.0;
for (auto j: v)
mySum += logPrimes[j];
if (mySum > sumMax)
sumMax = mySum;
}
// Since all of the sums will be double values and we want to
// ensure that they are compared with scrutiny, we multiply
// each sum by a very large integer to bring the decimals to
// the right of the zero and then convert them to an integer.
// E.g. Using the example above v1 = (1, 2, 7) & v2 = (7, 3, 1)
//
// sum of log of primes for v1 = log(2) + log(3) + log(17)
// ~= 4.62497281328427
//
// sum of log of primes for v2 = log(17) + log(5) + log(2)
// ~= 5.13579843705026
//
// multiplier = floor(.Machine$integer.max / 5.13579843705026)
// [1] 418140173
//
// Now, we multiply each sum and convert to an integer
//
// as.integer(4.62497281328427 * 418140173)
// [1] 1933886932 <<-- This is the key for v1
//
// as.integer(5.13579843705026 * 418140173)
// [1] 2147483646 <<-- This is the key for v2
const uint64_t multiplier = std::numeric_limits<int>::max() / sumMax;
std::unordered_set<uint64_t> canon;
canon.reserve(n);
for (int i = 0; i < n; ++i) {
IntegerVector v = tempLst[i];
double mySum = 0.0;
for (auto j: v)
mySum += logPrimes[j];
canon.insert(static_cast<uint64_t>(multiplier * mySum));
}
const auto myEnd = canon.end();
for (auto it = x.begin() + 1; it != x.end(); ++it) {
List tempLst = *it;
if (tempLst.length() != n)
return false;
for (int j = 0; j < n; ++j) {
IntegerVector v = tempLst[j];
double mySum = 0.0;
for (auto k: v)
mySum += logPrimes[k];
const uint64_t key = static_cast<uint64_t>(multiplier * mySum);
if (canon.find(key) == myEnd)
return false;
}
}
return true;
}
Here are the results when applied to lst1, lst2, lst3, & lst (the large one)
given by @GKi.
f_Rcpp_Hash(lst)
[1] TRUE
f_Rcpp_Hash(lst1)
[1] TRUE
f_Rcpp_Hash(lst2)
[1] FALSE
f_Rcpp_Hash(lst3)
[1] FALSE
And here are some benchmarks with the units
parameter set to relative
.
microbenchmark(check = 'equal', times = 10
, unit = "relative"
, f_ThomsIsCoding(lst3)
, f_chinsoon12(lst3)
, f_GKi_6a(lst3)
, f_GKi_6b(lst3)
, f_Rcpp_Hash(lst3))
Unit: relative
expr min lq mean median uq max neval
f_ThomsIsCoding(lst3) 84.882393 63.541468 55.741646 57.894564 56.732118 33.142979 10
f_chinsoon12(lst3) 31.984571 24.320220 22.148787 22.393368 23.599284 15.211029 10
f_GKi_6a(lst3) 7.207269 5.978577 5.431342 5.761809 5.852944 3.439283 10
f_GKi_6b(lst3) 7.399280 5.751190 6.350720 5.484894 5.893290 8.035091 10
f_Rcpp_Hash(lst3) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10
microbenchmark(check = 'equal', times = 10
, unit = "relative"
, f_ThomsIsCoding(lst)
, f_chinsoon12(lst)
, f_GKi_6a(lst)
, f_GKi_6b(lst)
, f_Rcpp_Hash(lst))
Unit: relative
expr min lq mean median uq max neval
f_ThomsIsCoding(lst) 199.776328 202.318938 142.909407 209.422530 91.753335 85.090838 10
f_chinsoon12(lst) 9.542780 8.983248 6.755171 9.766027 4.903246 3.834358 10
f_GKi_6a(lst) 3.169508 3.158366 2.555443 3.731292 1.902140 1.649982 10
f_GKi_6b(lst) 2.992992 2.943981 2.019393 3.046393 1.315166 1.069585 10
f_Rcpp_Hash(lst) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10
About 3x faster than the fastest solution yet on the larger example.
What does this mean?
To me, this result speaks volumes to the beauty and efficiency of base R
as displayed by @GKi, @chinsoon12, @Gregor, @ThomasIsCoding, and more. We wrote around 100 lines of very specific C++
to obtain a moderate speed up. To be fair, the base R
solutions end up calling mostly compiled code and end up utilizing hash tables as we did above.
After sorting you can use duplicated
and all
.
s <- lapply(lst, function(x) lapply(x, sort)) #Sort vectors
s <- lapply(s, function(x) x[order(vapply(x, "[", 1, 1))]) #Sort lists
all(duplicated(s)[-1]) #Test if there are all identical
#length(unique(s)) == 1 #Alternative way to test if all are identical
Alternative: Sort in one loop
s <- lapply(lst, function(x) {
tt <- lapply(x, sort)
tt[order(vapply(tt, "[", 1, 1))]
})
all(duplicated(s)[-1])
Alternative: Sort during loop and allow early exit
s <- lapply(lst[[1]], sort)
s <- s[order(vapply(s, "[", 1, 1))]
tt <- TRUE
for(i in seq(lst)[-1]) {
x <- lapply(lst[[i]], sort)
x <- x[order(vapply(x, "[", 1, 1))]
if(!identical(s, x)) {
tt <- FALSE
break;
}
}
tt
or using setequal
s <- lapply(lst[[1]], sort)
tt <- TRUE
for(i in seq(lst)[-1]) {
x <- lapply(lst[[i]], sort)
if(!setequal(s, x)) {
tt <- FALSE
break;
}
}
tt
or improving slightly the idea from @chinsoon12 to exchange the list with a vector!
s <- lst[[1]][order(vapply(lst[[1]], min, 1))]
s <- rep(seq_along(s), lengths(s))[order(unlist(s))]
tt <- TRUE
for(i in seq(lst)[-1]) {
x <- lst[[i]][order(vapply(lst[[i]], min, 1))]
x <- rep(seq_along(x), lengths(x))[order(unlist(x))]
if(!identical(s, x)) {tt <- FALSE; break;}
}
tt
or avoid the second order
s <- lst[[1]][order(vapply(lst[[1]], min, 1))]
s <- rep(seq_along(s), lengths(s))[order(unlist(s))]
y <- s
tt <- TRUE
for(i in seq(lst)[-1]) {
x <- lst[[i]][order(vapply(lst[[i]], min, 1))]
y <- y[0]
y[unlist(x)] <- rep(seq_along(x), lengths(x))
if(!identical(s, y)) {tt <- FALSE; break;}
}
tt
or exchange order
with match
(or fmatch
)
x <- lst[[1]]
s <- "[<-"(integer(),unlist(x),rep(seq_along(x), lengths(x)))
s <- match(s, unique(s))
tt <- TRUE
for(i in seq(lst)[-1]) {
x <- lst[[i]]
y <- "[<-"(integer(),unlist(x),rep(seq_along(x), lengths(x)))
y <- match(y, unique(y))
if(!identical(s, y)) {tt <- FALSE; break;}
}
tt
Or without early exit.
s <- lapply(lst, function(x) {
y <- "[<-"(integer(),unlist(x),rep(seq_along(x), lengths(x)))
match(y, unique(y))
})
all(duplicated(s)[-1])
or written in C++
sourceCpp(code = "#include <Rcpp.h>
#include <vector>
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
bool f_GKi_6_Rcpp(const List &x) {
const List &x0 = x[0];
const unsigned int n = x0.length();
unsigned int nn = 0;
for (List const &i : x0) {nn += i.length();}
std::vector<int> s(nn);
for (unsigned int i=0; i<n; ++i) {
const IntegerVector &v = x0[i];
for (int const &j : v) {
if(j > nn) return false;
s[j-1] = i;
}
}
{
std::vector<int> lup(n, -1);
int j = 0;
for(int &i : s) {
if(lup[i] < 0) {lup[i] = j++;}
i = lup[i];
}
}
for (List const &i : x) {
if(i.length() != n) return false;
std::vector<int> sx(nn);
for(unsigned int j=0; j<n; ++j) {
const IntegerVector &v = i[j];
for (int const &k : v) {
if(k > nn) return false;
sx[k-1] = j;
}
}
{
std::vector<int> lup(n, -1);
int j = 0;
for(int &i : sx) {
int &lupp = lup[i];
if(lupp == -1) {lupp = j; i = j++;
} else {i = lupp;}
}
}
if(s!=sx) return false;
}
return true;
}
")
Thanks to @Gregor for hints to improve the answer!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With