Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl. Using until function

I have a simple data file. Each line in the file has four elements. Some lines are filled with no blank entries. Other lines have a first entry and the remaining three are blank, or rather "filled" with a space. It is a tab delimited file.

Example of the input file:

    .
    .
    .
    30  13387412    34.80391242 sSN_FIRST
    30  13387412    34.80391242 sSN5_40
    30.1             
    30.2             
    30.3             
    30.4             
    31  14740248    65.60590089 s32138223_44
    31  14740248    65.60590089 s321382_LAST
    .
    .
    .

To reiterate, the "blanks" in my file actually contain a single space, if this matters.

My overall goal is to "fill in" the second and third column (the fourth column is ignored) throughout the file. In order to do that, I need my script to identify sets of consecutive lines that are blank, plus the line immediately preceding and the line immediately succeeding the set of consecutive blank lines. In the example above, this would be lines 2 - 7. Once I can do that, I can use the information in the flanking lines, which are filled, to help "fill in" the missing entries in the lines in between.

I have been experimenting with the until function, but I'm not succeeding in coupling it with a loop that reads the data line for line. For example, I can read the lines and find the blank lines:

open( my $FILE, "<$mapfile" );
my @file = <$FILE>;
close $FILE;

for ( my $i = 1 ; $i < scalar @file ; $i++ ) 
    {
     my @entries = split( '\t', $file[ $i ] );
     if ( $entries[ 1 ] =~ m/ / ) 
        {
         print $file[ $i ]."\n";
        }
    }

But I am trying to employ the until function, so as to read lines and search for the consecutive set of lines I am looking for ("blank" lines plus the two flanking "full" lines). For example:

until ( $file[ a line ] =~ m/ / && $file[ another line ] =~ m/ / )   
    {
     my linear interpolation here;
    }

Can anyone give me a hint about how to couple a way to read the array and compare lines to find the sets I need across the file?

like image 520
ES55 Avatar asked Jan 09 '13 20:01

ES55


People also ask

What is unless in Perl?

The syntax of an unless statement in Perl programming language is − unless(boolean_expression) { # statement(s) will execute if the given condition is false } If the boolean expression evaluates to false, then the block of code inside the unless statement will be executed.

How do I iterate through a list in Perl?

A foreach loop is used to iterate over a list and the variable holds the value of the elements of the list one at a time. It is majorly used when we have a set of data in a list and we want to iterate over the elements of the list instead of iterating over its range.

How do you exit a loop in Perl?

The Perl last statement is used inside a loop to exit the loop immediately. The last statement is like the break statement in other languages such as C/C++, Java.

What is the difference between for loop and for each loop in Perl?

There is no difference. From perldoc perlsyn: The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. Save this answer.


1 Answers

What you want to implement is a caching algorithm: something that remembers (caches) previous values, and uses them if nothing new appears. You don't even need a regex for this. :)

In addition to caching the old values, you need to cache the lines inbetween. Since you only needed the labels, you only need to hold on to those. Then, when you reach the next full line, you can do your interpolation and emit the results.

Here's how I'd do it. It's a bit more complex than my original example, but the same principle applies: just store the intermediate lines, then emit the results when you reach your terminal.

use strict;
use warnings;
use feature 'say';


# Get start conditions, and cache those numbers.

sub read_block
{
   my $line = <DATA>;
   return 1 unless defined $line; # we're done if nothing more to read

   # Process and store data from the first line in the block.
   chomp $line;
   my ($last_label, $last_num1, $last_num2, $last_label2) = split /\t/, $line;

   # Keep reading lines until we find the end of the block.
   my @label_cache;
   my $found_last = 0;
   my ($label1, $num1, $num2, $label2);
   while (!$found_last)
   {
      $line = <DATA>;
      chomp $line;
      ($label1, $num1, $num2, $label2) = split /\t/, $line;
      if (defined $num1 && defined $num2)
      {
         $found_last = 1; # We have final numbers!  We can interpolate now.
      }
      else
      {
         push @label_cache, $label1; 
      }
   }

   # Begin display.  Show the first line of the block.
   say "$last_label\t$last_num1\t$last_num2\t$last_label2";

   # Calculate the slope for interpolation: (last - first) / difference
   my $slope1 = ($num1 - $last_num1) / (@label_cache + 1);
   my $slope2 = ($num2 - $last_num2) / (@label_cache + 1);
   my $distance = 0;

   # Display each label and the lines inside.
   foreach my $label (@label_cache)
   {
      ++$distance;
      say $label, "\t",
          $slope1 * $distance + $last_num1, "\t",
          $slope2 * $distance + $last_num2;
   }

   # Display the final line in the block.
   say "$label1\t$num1\t$num2\t$label2";

   # Not done yet, so return a 'false' value.
   return 0;
}

# Main part of the script

my $done = 0;
while (! $done)
{
   $done = read_block();
}


__DATA__
a   3   4   end
e
f
g
h
i
k   15  26  start
k   15  26  end
o
p
q
r
s   3   5   start
s   3   5   end
v
w
x
y   14  16  start

emits:

a       3       4       end
e       5       7.66666666666667
f       7       11.3333333333333
g       9       15
h       11      18.6666666666667
i       13      22.3333333333333
k       15      26      start
k       15      26      end
o       12.6    21.8
p       10.2    17.6
q       7.8     13.4
r       5.4     9.2
s       3       5       start
s       3       5       end
v       5.75    7.75
w       8.5     10.5
x       11.25   13.25
y       14      16      start

You could then, of course, do whatever kind of number rounding or formatting that you needed. :)

like image 184
Robert P Avatar answered Sep 26 '22 02:09

Robert P