Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract the required substring from another string -Perl

I want to extract a substring from a line in Perl. Let me explain giving an example:

fhjgfghjk3456mm   735373653736
icasd 666666666666
111111111111

In the above lines, I only want to extract the 12 digit number. I tried using split function:

my @cc = split(/[0-9]{12}/,$line);
print @cc;

But what it does is removes the matched part of the string and stores the residue in @cc. I want the part matching the pattern to be printed. How do I that?

like image 306
Amey Avatar asked Dec 06 '22 13:12

Amey


2 Answers

You can do it with regular expressions:

#!/usr/bin/perl
my $string = 'fhjgfghjk3456mm 735373653736 icasd 666666666666 111111111111';
while ($string =~ m/\b(\d{12})\b/g) {
  say $1;
}

Test the regex here: http://rubular.com/r/Puupx0zR9w

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/\b(\d+)\b/)->explain();

The regular expression:

(?-imsx:\b(\d+)\b)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
like image 170
simbabque Avatar answered Jan 21 '23 03:01

simbabque


The $1 built-in variable stores the last match from a regex. Also, if you perform a regex on a whole string, it will return the whole string. The best solution here is to put parentheses around your match then print $1.

my $strn = "fhjgfghjk3456mm 735373653736\nicasd\n666666666666 111111111111";
$strn =~ m/([0-9]{12})/;
print $1;

This makes our regex match JUST the twelve digit number and then we return that match with $1.

like image 25
PinkElephantsOnParade Avatar answered Jan 21 '23 04:01

PinkElephantsOnParade