Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing an efficient binary search in Erlang

P.S. this is my first entry to SO, if I'm doing anything wrong please let me know and I'll try to fix it. I've spent pretty much all day trying to fix this algorithm implementation, and I'm at my wit's end!

A class I am currently taking is asking for 3 implementations of the same binary search algorithm in 3 substantially different languages. For one of them, I decided to go the hard way and take a shot at Erlang. Here is my code:

-export(main/1).

main(_) ->
    io:format("Running trials... ~n", []),
    N = [2 bsl (2 * X + 6) || X <- lists:seq(0, 7)],
    lists:foreach(fun(X) -> io:format("~p: ~p ~n", [X, time_search(X)]) end, N).

time_search(Size) ->
    A = list_to_tuple(create_list(Size)),
    Iterations = 2000000,
    time_search(A, Size + 1, 0, Iterations) / Iterations.

time_search(_, _, Sum, 0) -> Sum;
time_search(A, Key, Sum, IterationNum) ->
    Before = now(),
    bin_search(A, Key),
    time_search(A, Key, Sum + timer:now_diff(now(), Before), IterationNum - 1).

create_list(Size) -> lists:seq(1, Size).

bin_search(A, Key) -> bin_search(A, Key, 1, tuple_size(A)).

bin_search(_, _, Lower, Upper) when Lower > Upper -> false;
bin_search(A, Key, Lower, Upper) ->
    Mid = (Upper + Lower) div 2,
    Item = element(Mid, A),
    if
        Key > Item -> bin_search(A, Key, Mid + 1, Upper);
        Key < Item -> bin_search(A, Key, Lower, Mid - 1);
        true -> true
    end.

So this is the output:

128: 250.8094435 µs
512: 308.7264845 µs
2048: 368.5770285 µs
8192: 425.748134 µs
32768: 477.6267655 µs
131072: 533.340876 µs
524288: 601.023178 µs
2097152: 661.099404 µs

But when compared to my ruby implementation, it is is clearly orders of magnitude off from O(lg n)

EDIT: Ok, so maybe not orders...it is pretty logarithmic it seems, but it still seems wrong...

Ruby:

128: 10.4756 µs
512: 11.8172 µs
2048: 13.5328 µs
8192: 15.3139 µs
32768: 17.0915 µs
131072: 18.8721 µs
524288: 21.5237 µs
2097152: 21.7187 µs

Previously I was using Lists, but I learned that those don't have O(1) retrieval time, so I switched to Tuples. What is causing my Erlang to run so inefficiently?

Just in case, here is my Ruby implementation

def p5_BinarySearch
  n = Array.new(8){ |i| 2 << (2 * i + 6) }
  n.each { |x|
    time = timeSearch x
    puts "#{x}: #{time.round 4} µs"
  }
end

def timeSearch(size)
  values = createArray size
  iterations = 2e7.to_int
  totalTime = 0
  iterations.times{
    before = Time.now
    binarySearch(values, size + 1)
    totalTime += Time.now - before
  }
  totalTime * 1e7 / iterations
end

def createArray(size)
  (1..size).to_a
end

def binarySearch(values, key, low = 0, high = values.length - 1)
  return false if low > high
  mid = (low + high) / 2
  return binarySearch(values, key, mid + 1, high) if key > values[mid]
  return binarySearch(values, key, low, mid - 1) if key < values[mid]
  true
end

if __FILE__ == $0
  puts "Running trials..."
  p5_BinarySearch
end
like image 286
Connor Clark Avatar asked Mar 22 '23 09:03

Connor Clark


1 Answers

The Algorithm looks good but using escript interpreter is the problem in this case. Use escript -c to run the escript so that the script is compiled before running rather than interpreted or you can create an Erlang module as suggested by rvirding. With this you can see that it executes really fast.

Also better to use os:timestamp() rather than using now() to get the accurate timing values. Even better for accuracy is to use the timer:tc function.

{Time,_} = timer:tc(fun bin_search/2, [A, Key]),
time_search(A, Key, Sum + Time, IterationNum - 1).

The results were something like this on my machine. (without -c was taking around 450 microsec)

128: 1.105
512: 1.2005
2048: 1.4015
8192: 1.5145
32768: 1.7225
131072: 1.8515
524288: 2.024
2097152: 2.188
like image 67
Vinod Avatar answered Mar 31 '23 16:03

Vinod