Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl, match one pattern multiple times in the same line delimited by unknown characters

Tags:

regex

perl

I've been able to find similar, but not identical questions to this one. How do I match one regex pattern multiple times in the same line delimited by unknown characters?

For example, say I want to match the pattern HEY. I'd want to recognize all of the following:

HEY

HEY HEY

HEYxjfkdsjfkajHEY

So I'd count 5 HEYs there. So here's my program, which works for everything but the last one:

open ( FH, $ARGV[0]);
while(<FH>)
{
  foreach $w ( split )
  {
      if ($w =~ m/HEY/g)
      {
            $count++;
      }
  }
}

So my question is how do I replace that foreach loop so that I can recognize patterns delimited by weird characters in unknown configurations (like shown in the example above)?

EDIT:

Thanks for the great responses thus far. I just realized I need one other thing though, which I put in a comment below.

One question though: is there any way to save the matched term as well? So like in my case, is there any way to reference $w (say if the regex was more complicated, and I wanted to store it in a hash with the number of occurrences)

So if I was matching a real regex (say a sequence of alphanumeric characters) and wanted to save that in a hash.

like image 829
varatis Avatar asked Feb 06 '12 06:02

varatis


People also ask

What is Perl match pattern?

Perl, match one pattern multiple times in the same line delimited by unknown characters Ask Question Asked9 years, 10 months ago Active7 years, 10 months ago Viewed29k times

How do you split a string in Perl?

Perl | split () Function. split() is a string function in Perl which is used to split or you can say to cut a string into smaller sections or pieces. There are different criteria to split a string, like on a single character, a regular expression (pattern), a group of characters or on undefined value etc..

How many characters can regex match in Perl 163?

Regex Match Unknown Number of alphanumeric characters 239 Regex for string not ending with given suffix 4 Replace pattern with one space per character in Perl 163 Replace all non alphanumeric characters, new lines, and multiple white space with one space

What is regex in Perl?

regex - Perl, match one pattern multiple times in the same line delimited by unknown characters - Stack Overflow I've been able to find similar, but not identical questions to this one.


2 Answers

One way is to capture all matches of the string and see how many you got. Like so:

open (FH, $ARGV[0]);
while(my $w = <FH>) {
    my @matches = $w =~ m/(HEY)/g;
    my $count = scalar(@matches);
    print "$count\t$w\n";
}

EDIT:

Yes, there is! Just loop over all the matches, and use the capture variables to increment the count in a hash:

my %hash;
open (FH, $ARGV[0]);
while (my $w = <FH>) {
   foreach ($w =~ /(HEY)/g) {
       $hash{$1}++;
   }
}
like image 81
masaers Avatar answered Sep 24 '22 14:09

masaers


The problem is you really don't want to call split(). It splits things into words, and you'll note that your last line only has a single "word" (though you won't find it in the dictionary). A word is bounded by white-space and thus is just "everything but whitespace".

What you really want is to continue to do look through each line counting every HEY, starting where you left off each time. Which requires the /g at the end but to keep looking:

while(<>)
{
      while (/HEY/g)
      {
            $count++;
      }
}

print "$count\n";

There is, of course, more than one way to do it but this sticks close to your example. Other people will post other wonderful examples too. Learn from them all!

like image 27
Wes Hardaker Avatar answered Sep 25 '22 14:09

Wes Hardaker