Below link it is mentioned chances of collision but I am trying to use it for finding duplicate entry:
http://www.cplusplus.com/reference/functional/hash/
I am using std::hash<std::string>
and storing the return value in std::unordered_set. if emplace is fails, I am marking string as it is duplicate string.
Hashes are generally functions from a large space of values into a small space of values, e.g. from the space of all strings to 64-bit integers. There are a lot more strings than 64-bit integers, so obviously multiple strings can have the same hash. A good hash function is such that there's no simple rule relating strings with the same hash value.
So, when we want to use hashes to find duplicate strings (or duplicate anything), it's always a two-phase process (at least):
std::unordered_set
does this - and never mind the specifics. Note that it does this for you, so it's redundant for you to hash yourself, then store the result in an std::unordered_set
.
Finally, note that there are other features one could use for initial duplicate screening - or for searching among the same-hash values. For example, string length: Before comparing two strings character-by-character, you check their lengths (which you should be able to access without actually iterating the strings); different lengths -> non-equal strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With