I would like to implement a binary search algorithm in Perl. My 'array' is sorted in decreasing order (not an actual array, but a function that gets an index and returns values). the problem is that there might be stretches of identical values. If my searched value is in such a stretch, I want to return the first index that contains it.
This is what I have written:
# get_val should be a *decreasing* function for idexes $i in min..max,
# formally: for any $i,$j s.t. $max>=$i>$j>=$min :
# $get_val_subref($i, $extra) <= $get_val_subref($j, $extra)
# min and max are the inclusive boundaries for the search
# get_val sub should get an index in min..max and an extra data reference, and return
# the value for the given index
# returns the smallest index $i in min..max for which $get_val_subref($j, $extra)
# returns $searched_val, or undef if no such index exists
sub binary_search {
my ( $min, $max, $searched_val, $get_val_subref, $get_val_sub_extra_data )
= @_;
my ( $mid, $val );
while ( $min <= $max ) {
$mid = $min + int( ( $max - $min ) / 2 );
$val = $get_val_subref->( $mid, $get_val_sub_extra_data );
if ( $val > $searched_val ) {
$min = $mid + 1;
}
elsif ( $val < $searched_val ) {
$max = $mid - 1;
}
else { ## SEE MY QUESTION BELOW ##
# surely $val == $searched_val, but is it the first one?
if ( $mid > $min
and $get_val_subref->( $mid - 1, $get_val_sub_extra_data )
== $searched_val )
{
# $val == $searched_val and prev($val) == $searched_val
# we have to continue
$max = $mid - 1;
}
else {
# $val == $searched_val and prev($val) != $searched_val
# wer'e done
return $mid;
}
}
}
# $val was not found. return undef
return undef;
}
and this is a simple example for using it:
sub get_val_sub {
my ( $pos, $a ) = @_;
my $val = $a->[$pos];
return $val;
}
my @arr = (80, 40, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0);
say "RET:", binary_search( 0, $#arr, 0, \&get_val_sub, \@arr );
The problem is I'm not sure my last else (marked with ## SEE MY QUESTION BELOW ##
) is "pretty". Is there a better way of doing that?
Binary Search is a searching algorithm for finding an element's position in a sorted array. In this approach, the element is always searched in the middle of a portion of an array. Binary search can be implemented only on a sorted list of items. If the elements are not sorted already, we need to sort them first.
Binary search begins by comparing an element in the middle of the array with the target value. If the target value matches the element, its position in the array is returned. If the target value is less than the element, the search continues in the lower half of the array.
The exists() function in Perl is used to check whether an element in an given array or hash exists or not. This function returns 1 if the desired element is present in the given array or hash else returns 0.
Binary search is searching technique that works by finding the middle of the array for finding the element. For array of strings also the binary search algorithm will remain the same. But the comparisons that are made will be based on string comparison.
Although I initially agreed with Axeman's answer ... it is, in a small way, similar to my first (really bad) answer in using linear logic (at least a small bit of it). Specifically, there is no reason to call $get_val_subref
with $mid - 1
. That's an unnecessary linear search step.
Here's what I would suggest. In addition to avoiding linear searching, it has the benefit of being extremely simple:
sub binary_search {
...
my ( $mid, $val, $solution );
while ( $min <= $max ) {
...
else {
$solution = $mid; # Store a possible solution.
$max = $mid - 1; # But continue with the binary search
# until $min and $max converge on each other.
}
}
return $solution;
}
Although I first agreed with FM's answer, the case that you show (with all the zeroes) is not a good case for a linear back search. And although I didn't like that you simply continued the binary search, "the first x" does have a computable value, and would still have sub-linear performance, while the linear back search has--of course--a linear one.
So I like your idea, but it's more compact like this:
else {
return $mid unless
( $mid > $min
and $get_val_subref->( $mid - 1, $get_val_sub_extra_data )
== $searched_val
);
$max = $mid - 1;
}
The linear back search is an easier computation, but as the value functions get more complex, the fewer the computations the better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With