I have a simple data file. Each line in the file has four elements. Some lines are filled with no blank entries. Other lines have a first entry and the remaining three are blank, or rather "filled" with a space. It is a tab delimited file.
Example of the input file:
.
.
.
30 13387412 34.80391242 sSN_FIRST
30 13387412 34.80391242 sSN5_40
30.1
30.2
30.3
30.4
31 14740248 65.60590089 s32138223_44
31 14740248 65.60590089 s321382_LAST
.
.
.
To reiterate, the "blanks" in my file actually contain a single space, if this matters.
My overall goal is to "fill in" the second and third column (the fourth column is ignored) throughout the file. In order to do that, I need my script to identify sets of consecutive lines that are blank, plus the line immediately preceding and the line immediately succeeding the set of consecutive blank lines. In the example above, this would be lines 2 - 7. Once I can do that, I can use the information in the flanking lines, which are filled, to help "fill in" the missing entries in the lines in between.
I have been experimenting with the until
function, but I'm not succeeding in coupling it with a loop that reads the data line for line. For example, I can read the lines and find the blank lines:
open( my $FILE, "<$mapfile" );
my @file = <$FILE>;
close $FILE;
for ( my $i = 1 ; $i < scalar @file ; $i++ )
{
my @entries = split( '\t', $file[ $i ] );
if ( $entries[ 1 ] =~ m/ / )
{
print $file[ $i ]."\n";
}
}
But I am trying to employ the until
function, so as to read lines and search for the consecutive set of lines I am looking for ("blank" lines plus the two flanking "full" lines). For example:
until ( $file[ a line ] =~ m/ / && $file[ another line ] =~ m/ / )
{
my linear interpolation here;
}
Can anyone give me a hint about how to couple a way to read the array and compare lines to find the sets I need across the file?
The syntax of an unless statement in Perl programming language is − unless(boolean_expression) { # statement(s) will execute if the given condition is false } If the boolean expression evaluates to false, then the block of code inside the unless statement will be executed.
A foreach loop is used to iterate over a list and the variable holds the value of the elements of the list one at a time. It is majorly used when we have a set of data in a list and we want to iterate over the elements of the list instead of iterating over its range.
The Perl last statement is used inside a loop to exit the loop immediately. The last statement is like the break statement in other languages such as C/C++, Java.
There is no difference. From perldoc perlsyn: The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. Save this answer.
What you want to implement is a caching algorithm: something that remembers (caches) previous values, and uses them if nothing new appears. You don't even need a regex for this. :)
In addition to caching the old values, you need to cache the lines inbetween. Since you only needed the labels, you only need to hold on to those. Then, when you reach the next full line, you can do your interpolation and emit the results.
Here's how I'd do it. It's a bit more complex than my original example, but the same principle applies: just store the intermediate lines, then emit the results when you reach your terminal.
use strict;
use warnings;
use feature 'say';
# Get start conditions, and cache those numbers.
sub read_block
{
my $line = <DATA>;
return 1 unless defined $line; # we're done if nothing more to read
# Process and store data from the first line in the block.
chomp $line;
my ($last_label, $last_num1, $last_num2, $last_label2) = split /\t/, $line;
# Keep reading lines until we find the end of the block.
my @label_cache;
my $found_last = 0;
my ($label1, $num1, $num2, $label2);
while (!$found_last)
{
$line = <DATA>;
chomp $line;
($label1, $num1, $num2, $label2) = split /\t/, $line;
if (defined $num1 && defined $num2)
{
$found_last = 1; # We have final numbers! We can interpolate now.
}
else
{
push @label_cache, $label1;
}
}
# Begin display. Show the first line of the block.
say "$last_label\t$last_num1\t$last_num2\t$last_label2";
# Calculate the slope for interpolation: (last - first) / difference
my $slope1 = ($num1 - $last_num1) / (@label_cache + 1);
my $slope2 = ($num2 - $last_num2) / (@label_cache + 1);
my $distance = 0;
# Display each label and the lines inside.
foreach my $label (@label_cache)
{
++$distance;
say $label, "\t",
$slope1 * $distance + $last_num1, "\t",
$slope2 * $distance + $last_num2;
}
# Display the final line in the block.
say "$label1\t$num1\t$num2\t$label2";
# Not done yet, so return a 'false' value.
return 0;
}
# Main part of the script
my $done = 0;
while (! $done)
{
$done = read_block();
}
__DATA__
a 3 4 end
e
f
g
h
i
k 15 26 start
k 15 26 end
o
p
q
r
s 3 5 start
s 3 5 end
v
w
x
y 14 16 start
emits:
a 3 4 end
e 5 7.66666666666667
f 7 11.3333333333333
g 9 15
h 11 18.6666666666667
i 13 22.3333333333333
k 15 26 start
k 15 26 end
o 12.6 21.8
p 10.2 17.6
q 7.8 13.4
r 5.4 9.2
s 3 5 start
s 3 5 end
v 5.75 7.75
w 8.5 10.5
x 11.25 13.25
y 14 16 start
You could then, of course, do whatever kind of number rounding or formatting that you needed. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With