Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl while loop when parse a string

I get a question about parse a vector has strings like this:

"chr1-247751935-G-.:M92R,chr1-247752366-G-.:R236G,"
"chr1-247951785-G-.:G98K,"
"chr13-86597895-S-78:M34*,chr13-56891235-S-8:G87K,chr13-235689125-S-7:M389L,"

I want to get:

"M92R R236G"
"G98K"
"M34* G87K M389L"

When I use

while ($info1=~s/^(.*)\:(([A-Z\*]){1}([\d]+)([A-Z\*]){1})\,//) 
{
    $pos=$2; 
}

the result $pos only give me the last one for each row, that is:

"R236G"
"G98K"
"M389L"

How should I correct the script?

like image 700
user2917442 Avatar asked Feb 27 '26 06:02

user2917442


2 Answers

The reason your code isn't working is that you have a greedy ^(.*) at the start of of the regular expression. That will take up as much of the target string as possible as long as the rest of the pattern matches, so you will find only the last occurrence of the substring. You can fix it by just changing it to a non-greedy pattern ^(.*?).

A few other notes on your regular expression:

  • There is no need to escape : or ,, or * when it is inside a character class [...]

  • There is never a need for the quantifier {1} as that is the effect of a pattern without a quantifier

  • There is no need to put \d inside a character class [\d], as it works fine on its own

  • There is no need to enclose subpatterns in parentheses unless you need access to whatever substring matched that subpattern when the match succeeds. So, for instance ^.* is fine without the parentheses

This modification of your code works identically to yours, but is very much more concise

while ($info1 =~ s/^.*?:([A-Z*]\d+[A-Z*]),// ) {
  my $pos = $1;
  ...
}

But the best solution is to use a global match that finds all occurrences of a pattern within a string, and doesn't need to modify the string in the process.

This program does what you describe. It just looks for all the alphanumeric or asterisk strings that follow a colon in each record.

use strict;
use warnings;

while (<DATA>) {
  my @fields = /:([A-Z0-9*]+)/g;
  print "@fields\n";
}

__DATA__
"chr1-247751935-G-.:M92R,chr1-247752366-G-.:R236G,"
"chr1-247951785-G-.:G98K,"
"chr13-86597895-S-78:M34*,chr13-56891235-S-8:G87K,chr13-235689125-S-7:M389L,"

output

M92R R236G
G98K
M34* G87K M389L
like image 187
Borodin Avatar answered Feb 28 '26 19:02

Borodin


Using a one-liner :

$ perl -ne 'print q/"/ . join(" ", m/:([^,]+),/g) . qq/"\n/' file
"M92R R236G"
"G98K" 
"M34* G87K M389L"

In a script :

$ perl -MO=Deparse -ne 'print "\042" . join(" ", m/:([^,]+),/g) . "\042\n"' file

script :

LINE: while (defined($_ = <ARGV>)) {
    print '"' . join(' ', /:([^,]+),/g) . qq["\n];
}
like image 34
Gilles Quenot Avatar answered Feb 28 '26 19:02

Gilles Quenot



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!