In Perl, how can I get the matched substring from a regex?

Tags:

perl

My program read other programs source code and colect information about used SQL queries. I have problem with getting substring.

...
$line = <FILE_IN>;
until( ($line =~m/$values_string/i && $line !~m/$rem_string/i) || eof )
{
   if($line =~m/ \S{2}DT\S{3}/i)
   {

   # here I wish to get (only) substring that match to pattern \S{2}DT\S{3} 
   # (7 letter table name) and display it.
      $line =~/\S{2}DT\S{3}/i;
      print $line."\n";
...

In result print prints whole line and not a substring I expect. I tried different approach, but I use Perl seldom and probably make basic concept error. ( position of tablename in line is not fixed. Another problem is multiple occurrence i.e.[... SELECT * FROM AADTTAB, BBDTTAB, ...] ). How can I obtain that substring?

905

asked Jul 15 '09 15:07

kato sheen

2 Answers

Use grouping with parenthesis and store the first group.

if( $line =~ /(\S{2}DT\S{3})/i )
{
  my $substring = $1;
}

The code above fixes the immediate problem of pulling out the first table name. However, the question also asked how to pull out all the table names. So:

# FROM\s+     match FROM followed by one or more spaces
# (.+?)       match (non-greedy) and capture any character until...
# (?:x|y)     match x OR y - next 2 matches
# [^,]\s+[^,] match non-comma, 1 or more spaces, and non-comma
# \s*;        match 0 or more spaces followed by a semi colon
if( $line =~ /FROM\s+(.+?)(?:[^,]\s+[^,]|\s*;)/i )
{
  # $1 will be table1, table2, table3
  my @tables = split(/\s*,\s*/, $1);
  # delim is a space/comma
  foreach(@tables)
  {
     # $_ = table name
     print $_ . "\n";
  }
}

Result:

If $line = "SELECT * FROM AADTTAB, BBDTTAB;"

Output:

AADTTAB
BBDTTAB

If $line = "SELECT * FROM AADTTAB;"

Output:

AADTTAB

Perl Version: v5.10.0 built for MSWin32-x86-multi-thread

178

answered Sep 19 '22 19:09

Jesse Vogt

I prefer this:

my ( $table_name ) = $line =~ m/(\S{2}DT\S{3})/i;

This

scans $line and captures the text corresponding to the pattern
returns "all" the captures (1) to the "list" on the other side.

This psuedo-list context is how we catch the first item in a list. It's done the same way as parameters passed to a subroutine.

my ( $first, $second, @rest ) = @_;


my ( $first_capture, $second_capture, @others ) = $feldman =~ /$some_pattern/;

NOTE:: That said, your regex assumes too much about the text to be useful in more than a handful of situations. Not capturing any table name that doesn't have dt as in positions 3 and 4 out of 7? It's good enough for 1) quick-and-dirty, 2) if you're okay with limited applicability.

answered Sep 16 '22 19:09

Axeman

Related questions
                            
                                Regex to check if http or https exists in the string
                            
                                Create regular expression from string
                            
                                RegExp.test not working?
                            
                                Javascript: REGEX to change all relative Urls to Absolute
                            
                                Get everything after word
                            
                                RegEx to extract URL from CSS background styling
                            
                                How to validate domain name using regex?
                            
                                Increment a number in a string in with regex
                            
                                extract filename from path [duplicate]
                            
                                Regular expression to match integers up to 9 digits
                            
                                How do I match a string up to the first comma (if present) with a Ruby regexp
                            
                                Regex for removing only specific special characters from string
                            
                                regex to remove comma between double quotes notepad++
                            
                                Using NSRegularExpression to extract URLs on the iPhone
                            
                                Django URL Pattern For Integer
                            
                                php: remove brackets/contents from a string?
                            
                                Regex for validating only numbers and dots
                            
                                How can I strip HTML in a string using Perl?
                            
                                PHP: fastest way to check for invalid characters (all but a-z, A-Z, 0-9, #, -, ., $)?
                            
                                Regex to extract initials from Name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With