Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl regexp matching for strings with special characters

Tags:

regex

perl

I have list of substrings which I need to match within a list of URL strings. The substrings have special characters like '|', '*', '-', '+' etc. If the URL strings contains that substring I need to do some operation. But for now lets just say I will print "TRUE" in the console.

I did this by first reading from the list of substrings and putting it into a hash. I then tried to perform a simple Regexp match of the entire list for each URL until a match is found. The code is something like this.

open my $ADS, '<', $ad_file or die "can't open $ad_file";

while(<$ADS>) {
        chomp;

        $ads_list_hash{$lines} = $_;
        $lines ++;
 }  

close $ADS;

open my $IN, '<', $inputfile or die "can't open $inputfile";      
my $first_line = <$IN>;

while(<$IN>) {      
       chomp;       

       my @hhfile = split /,/;       
       for my $count (0 .. $lines) {

            if($hhfile[9] =~ /$ads_list_hash{$count}/) {
                print "$hhfile[9]\t$ads_list_hash{$count}\n";

                print "TRUE !\n";
                last;
            }
       }

 }

 close $IN;

The problem is that the substrings have a lot of special characters which is causing errors in the match $hhfile[9] =~ /$ads_list_hash{$count}/. Few examples are;

+adverts/
.to/ad.php|
/addyn|*|adtech;

I get an error in lines like these which basically says "Quantifier follows nothing in regexp". Do I need to chanhge something in the regexp matching syntax to avoid these?

like image 623
sfactor Avatar asked Mar 25 '11 13:03

sfactor


1 Answers

You need to escape the special characters in the string.

Enclosing the string between \Q and \E will do the job:

if($hhfile[9] =~ /\Q$ads_list_hash{$count}\E/) {
like image 104
codaddict Avatar answered Oct 14 '22 05:10

codaddict