Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interview : Hash function: sine function

Tags:

c

trigonometry

I was asked this interview question. I am not sure what the correct answer for it is (and the reasoning behind the answer):

Is sin(x) a good hash function?

like image 276
agentx Avatar asked Oct 10 '12 21:10

agentx


4 Answers

Not really.

  1. It's horribly slow.
  2. You'll need to convert the result to some integer type anyway to avoid the insanity of floating-point equality comparisons. (Not actually the usual precision problems that are endemic to FP equality comparisons and which arise from calculating two things slightly different ways; I mean specifically the problems caused by things like the fact that 387-derived FPUs store extra bits of precision in their registers, so if a comparison is done between two freshly-calculated values in registers you could get a different answer than if exactly one of the operands was loaded into a register from memory.)
  3. It's almost flat near the peaks and troughs, so the quantisation step (multiplying by some large number and rounding to an integer) will produce many hash values near the min and max, rather than an even distribution.
like image 36
j_random_hacker Avatar answered Nov 06 '22 06:11

j_random_hacker


If you mean sin(), it's not a good hashing function because:

  • it's quite predictable and for some x it's no better than just x itself. There should be no seemingly apparent relationship between the key and the hash of the key.
  • it does not produce an integer value. You cannot index/subscript arrays with floating-point indices and there must be some kind of array in the hash table.
  • floating-point is very implementation-specific and even if you make a hash function out of sin(), it may not work with a different compiler or on a different kind of CPU/computer.
  • sin() may be much slower than some simpler integer-arithmetic function.
like image 138
Alexey Frunze Avatar answered Nov 06 '22 07:11

Alexey Frunze


Based off of mathematical knowledge:

Sine(x) is periodic so it's going to reach the same number from different values of x, so Sine(x) would be awful as a hashing function because you will get multiple values hashing to the exact same point. There are **infinitely many values between 0 and pi for the return value, but then past that the values will repeat. So 0 & pi & 2*pi will all hash to the same point.

If you could make the increment small enough and have Sine(x) multiplied by say x^2 or something of that nature it'd be mediocre at best, but then again, if you were to do that why not just use x^2 anyway and toss out the periodic function all together.

**infinitely: a large enough number that I'm not willing to count.

NOTE: Sine(x) will have values that are small and could be affected by rounding error.

NOTE: Any value taken from a sine function should be multiplied by an integer and then either modded or the floor or ceiling taken so that the value can be used as an array offset, etc.

like image 1
Jeff Langemeier Avatar answered Nov 06 '22 06:11

Jeff Langemeier


sin(x) is trigonometric function which repeats itself after every 360 degrees, so it's going to be a poor hash function as the hash will be repeated too often.

A simple refutation:

sin(0) == sin(360) == sin(720) == sin(..)

This is not a property of a goodhash function.

Even if you decide to use it, it's difficult to represent the value returned by sin. Sin function:

sin x = x - x^3/3! + x^5/5! - ...

This can't accurately represented due to floating point precision issue, which means for a same value it may produce two different hashes!

like image 1
P.P Avatar answered Nov 06 '22 06:11

P.P