I want to have a dictionary that assigns a value to a set of integers.
For example key
is [1 2 3]
and value
will have certain value.
The thing is that [3 2 1]
needs to be treated the same in my case so hash needs to be equal, if I go with hash approach.
The set will have 2 to 10 items.
Sum of items is usually fixed so we cannot make hashcode according to sum, which is a first natural idea here.
Not a homework task, actually facing this problem in my code.
This set is basically IEnumerable<int>
in C# so any data structure is fine to store them.
Any help appreciated. Performance is pretty important here too.
An immediate thought: we could sum up items^2
and already get some kind of better hash, but still I would like to hear some thoughts.
EDIT: hmm really sorry guys, everyone suggests ordering, didn't come to my mind that I needed to say that actually ordering and hashing is the current solution I use and I am considering faster alternatives.
Basically all of the approaches here are instantiations of the same template. Map x1, …, xn to f(x1) op … op f(xn), where op is a commutative associative operation on some set X, and f is a map from items to X. This template has been used a couple of times in ways that are provably good.
Choose a random large prime p and a random residue b in [1, p - 1]. Let f(x) = bx mod p and let op be addition. We essentially interpret a set as a polynomial and use the Schwartz–Zippel lemma to bound the probability of a collision (= the probability that a nonzero polynomial has b as a root mod p).
Let op be XOR and let f be a randomly chosen table. This is Zobrist hashing and minimizes in expectation the number of collisions by straightforward linear-algebraic arguments.
Modular exponentiation is slow, so don't use it. As for Zobrist hashing, with 3 million items, the table f probably won't fit into L2, though it does set an upper bound of one main-memory access.
I would instead take Zobrist hashing as a departure point and look for a cheap function f that behaves like a random function. This is essentially the job description of a non-cryptographic pseudorandom generator – I would try computing f by seeding a fast PRG with x and generating one value.
EDIT: given that the sets all have the same sums, don't choose f to be a degree 1 polynomial (e.g., the step function of a linear congruential generator).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With