I am trying to search for series of numbers within an array of integers. For instance, if the array consists of the numbers 1,2,3,10,12,14
, it could be summarized to
1 to 3 with offset 1,
10 to 14 with offset 2
Below my code, where I loop over the array from the second element, track the offset between consecutive array elements and create a new 'series' if the offset changes:
use strict;
use warnings;
my @numbers = (1,2,3,10,12,14); #array to extract series from
my $last_diff;
my $start = $numbers[0];
my $end;
my @all_series; #array will hold all information on series
for my $i (1..($#numbers+1)){
my $diff;
if ($i <($#numbers+1)){
$diff = $numbers[$i] - $numbers[$i-1];
}
if (!$diff || ( $last_diff && ($last_diff != $diff)) ) {
$end = $numbers[$i-1];
my $series = { 'start'=> $start,
'end' => $end,
'offset'=> $start == $end ? 1 : $last_diff,
};
push @all_series, $series;
$start = $numbers[$i];
}
$last_diff = $diff;
}
use Data::Dumper;
print Dumper(@all_series);
Output looks as follows:
$VAR1 = {
'offset' => 1,
'end' => 3,
'start' => 1
};
$VAR2 = {
'offset' => 1,
'end' => 10,
'start' => 10
};
$VAR3 = {
'offset' => 2,
'end' => 14,
'start' => 12
};
This is not the desired result, since the last two series could be summarised into one (10 to 14, offset 2 instead of two series).
The flaw in the algorithm is independent from perl, however, maybe someone could give me a hint on how to approach this best, maybe there exist some perl-specific tricks for this.
In my application, all integers in the array are in ascending order and duplicate numbers do not exist.
EDIT If single numbers occur that cannot be assignet to a serious, they should be a series of length one.
The more numbers can be summarized to series, the better (I want to minimize the number of series!)
The problem is in the ternary operator. If you used plain
offset => $last_diff,
you'd notice that there is
$VAR2 = {
'offset' => 7,
'end' => 10,
'start' => 10
Which is correct in a way. To avoid it, you can undef $diff
after pushing to @series. It would produce the expected output for your case, but would still treat 1 2 3 7 10 12 14
as three sequences, starting at 1, 7 and 12. What you need is to make a longer sentence greedy somehow, now.
I experimented with the following, but you should test more:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my @numbers = (1, 2, 3, 10, 12, 14);
my $last_diff;
my $start = $numbers[0];
my @all_series;
for my $i (1 .. $#numbers + 1) {
my $diff;
if ($i < $#numbers + 1) {
$diff = $numbers[$i] - $numbers[ $i - 1 ];
}
# Merge with the last number from the previous series if needed:
if (!$last_diff # Just starting a new series.
and $i > 2 # Far enough to have preceding numbers.
and $diff and $diff == $numbers[ $i - 1 ] - $numbers[ $i - 2 ]
) {
$all_series[-1]{end} = $numbers[ $i - 3 ];
$all_series[-1]{offset} = 0 if $all_series[-1]{start} == $all_series[-1]{end};
$start = $numbers[ $i - 2 ];
}
if (! $diff or ( $last_diff && ($last_diff != $diff)) ) {
push @all_series, { start => $start,
end => $numbers[ $i - 1 ],
offset => $last_diff,
};
$start = $numbers[$i];
undef $diff;
}
$last_diff = $diff;
}
print Dumper(@all_series);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With