I have a list of integers, for example 1,2,2,3,4,1
. I need to be able to check for equivalence (==) between different lists.
However, I do not mean a simple number wise comparison. Each of these lists actually denotes a set partition, where the position in the list denotes the index of an element and the number denotes an index of the group. For example in the former, element 0 and element 5 are in the same group, element 1 and 2 are in the same group and element 3 and 4 are both in their own individual groups. The actual index of the group is not important, only the grouping.
I need to be able to test equivalence in this sense, so for example the previous list would be equivalent to 5,3,3,2,9,5,
since they have the same grouping.
The way I have been doing this is reducing the array to a kind of normal form. I find all numbers having the same value as the first number, and set these all to 0. I then continue in the list until I find a new number, find all numbers of the same value is this and set them all to 1. I continue in this manner.
In my example, both numbers would reduce to would reduce down to 0,1,1,2,3,0
and of course I can then just use a simple comparison to see if they are equivalent.
However this is quite slow, as I have to make several linear passes over the list. So to cut to the chase, is there any more efficient manner of reducing these numbers to this normal form?
Howver, more generally, can I avoid this reduction all together and compare arrays in a different and perhaps more efficient manner?
Implementation details
Try iterating through the two sequences in parallel, keeping a map (either std::map
or an array) from values in the first array to values in the second and vice versa. If you get to a pair that is not in your table, add it, unless there is something in the table for either that first or second number (since that would indicate inequality). For example:
1,2,2,3,4,1
5,3,3,2,9,5
You would add 1->5, 2->3, 3->2, and 4->9 and the comparison would pass. For something slightly different:
5,3,3,2,9,5
1,2,2,3,2,1
you would add 5->1, 3->2, 2->3, then 9->2 would fail since there is already a binding for 2 in the second sequence; thus, you would know that the sequences were not equivalent.
For creating a hash function, you would probably need to do the normalization that you are doing, but it should require only one pass through the sequence. Again, keep maps in both directions, but if you find an unknown element in the input sequence, map it to the next available number, and otherwise use the map to transform the input sequence into a normalized one.
For an alphabet of K
symbols and an array of N
of these symbols, you should be able to produce the signature (or canonical representation) of the array in O(N)
, using a hash table, or in O(N log K)
using a binary search tree.
The trick is to perform the conversion of all digits in one pass:
std::unordered_map<std::size_t,std::size_t> map;
std::vector<std::size_t> signature;
signature.reserve(array.size());
for (std::size_t i: array) {
// insert only inserts if they key is not already present
// it returns std::pair<iterator,bool> with iterator pointing
// to the pair {key: i, value: index}
size_t index = map.insert({i, map.size()}).first->second;
signature.push_back(index);
}
The hash of the array is then the hash of its signature.
But more fundamentally, there is no reason not to put all arrays in their canonical representation once and for all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With