Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl - extract series of numbers with offsets from array

I am trying to search for series of numbers within an array of integers. For instance, if the array consists of the numbers 1,2,3,10,12,14, it could be summarized to

1 to 3 with offset 1,

10 to 14 with offset 2

Below my code, where I loop over the array from the second element, track the offset between consecutive array elements and create a new 'series' if the offset changes:

use strict;
use warnings;

my @numbers = (1,2,3,10,12,14); #array to extract series from 
my $last_diff;
my $start = $numbers[0];
my $end;
my @all_series; #array will hold all information on series
for my $i (1..($#numbers+1)){
        my $diff;
        if ($i <($#numbers+1)){
                $diff = $numbers[$i] - $numbers[$i-1];
        }
        if (!$diff || ( $last_diff && ($last_diff != $diff)) ) {
                $end = $numbers[$i-1];
                my $series = { 'start'=> $start,
                            'end'  => $end,
                            'offset'=> $start == $end ? 1 : $last_diff,
                };
                push @all_series, $series;
                $start = $numbers[$i];
        }
        $last_diff = $diff;
}

use Data::Dumper;
print Dumper(@all_series);

Output looks as follows:

$VAR1 = {
          'offset' => 1,
          'end' => 3,
          'start' => 1
        };
$VAR2 = {
          'offset' => 1,
          'end' => 10,
          'start' => 10
        };
$VAR3 = {
          'offset' => 2,
          'end' => 14,
          'start' => 12
        };

This is not the desired result, since the last two series could be summarised into one (10 to 14, offset 2 instead of two series).

The flaw in the algorithm is independent from perl, however, maybe someone could give me a hint on how to approach this best, maybe there exist some perl-specific tricks for this.

In my application, all integers in the array are in ascending order and duplicate numbers do not exist.

EDIT If single numbers occur that cannot be assignet to a serious, they should be a series of length one.

The more numbers can be summarized to series, the better (I want to minimize the number of series!)

like image 925
user1981275 Avatar asked Jun 26 '14 13:06

user1981275


1 Answers

The problem is in the ternary operator. If you used plain

offset => $last_diff,

you'd notice that there is

$VAR2 = {
          'offset' => 7,
          'end' => 10,
          'start' => 10

Which is correct in a way. To avoid it, you can undef $diff after pushing to @series. It would produce the expected output for your case, but would still treat 1 2 3 7 10 12 14 as three sequences, starting at 1, 7 and 12. What you need is to make a longer sentence greedy somehow, now.

I experimented with the following, but you should test more:

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

my @numbers = (1, 2, 3, 10, 12, 14);
my $last_diff;
my $start = $numbers[0];
my @all_series;
for my $i (1 .. $#numbers + 1) {
    my $diff;
    if ($i < $#numbers + 1) {
        $diff = $numbers[$i] - $numbers[ $i - 1 ];
    }

    # Merge with the last number from the previous series if needed:
    if (!$last_diff # Just starting a new series.
        and $i > 2  # Far enough to have preceding numbers.
        and $diff and $diff == $numbers[ $i - 1 ] - $numbers[ $i - 2 ]
       ) {
        $all_series[-1]{end} = $numbers[ $i - 3 ];
        $all_series[-1]{offset} = 0 if $all_series[-1]{start} == $all_series[-1]{end};
        $start = $numbers[ $i - 2 ];
    }

    if (! $diff or ( $last_diff && ($last_diff != $diff)) ) {
        push @all_series, { start  => $start,
                            end    => $numbers[ $i - 1 ],
                            offset => $last_diff,
                          };
        $start = $numbers[$i];
        undef $diff;
    }
    $last_diff = $diff;
}

print Dumper(@all_series);
like image 53
choroba Avatar answered Nov 03 '22 03:11

choroba