Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl Regex Match and Removal

Tags:

regex

perl

I have a string which starts with //#... goes upto the newline characater. I have figured out the regex for the which is this ..#([^\n]*).

My question is how do you remove this line from a file if the following condition matches

like image 504
Azlam Avatar asked Sep 17 '08 06:09

Azlam


People also ask

What does =~ do in Perl?

The operator =~ associates the string with the regex match and produces a true value if the regex matched, or false if the regex did not match. In our case, World matches the second word in "Hello World" , so the expression is true.

What does %s mean in Perl?

Substitution Operator or 's' operator in Perl is used to substitute a text of the string with some pattern specified by the user.

How do I match a regular expression in Perl?

Perl Matching Operator =~The matching operator =~ is used to match a word in the given string. It is case sensitive, means if string has a lowercase letter and you are searching for an uppercase letter then it will not match.

What is the meaning of $1 in Perl regex?

$1 equals the text " brown ".


2 Answers

Your regex is badly chosen on several points:

  1. Instead of matching two slashes specifically, you use .. to match two characters that can be anything at all, presumably because you don’t know how to match slashes when you’re also using them as delimiters. (Actually, dots match almost anything, as we’ll see in #3.)

    Within a slash-delimited regex literal, //, you can match slashes simply by protecting them with backslashes, eg. /\/\//. The nicer variant, however, is to use the longer form of regex literal, m//, where you can choose the delimiter, eg. m!!. Since you use something other than slashes for delimitation, you can then write them without escaping them: m!//!. See perldoc perlop.

  2. It’s not anchored to the start of the string so it will match anywhere. Use the ^ start-of-string assertion in front.

  3. You wrote [^\n] to match “any character except newline” when there is a much simpler way to write that, which is just the . wildcard. It does exactly that – match any character except newline.

  4. You are using parentheses to group a part of the match, but the group is neither quantified (you are not specifying that it can match any other number of times than exactly once) nor are you interested in keeping it. So the parentheses are superfluous.

Altogether, that makes it m!^//#.*!. But putting an uncaptured .* (or anything with a * quantifier) at the end of a regex is meaningless, since it never changes whether a string will match or not: the * is happy to match nothing at all.

So that leaves you with m!^//#!.

As for removing the line from the file, as everyone else explained, read it in line by line and print all the lines you want to keep back to another file. If you are not doing this within a larger program, use perl’s command line switches to do it easily:

perl -ni.bak -e'print unless m!^//#!' somefile.txt

Here, the -n switch makes perl put a loop around the code you provide which will read all the files you pass on the command line in sequence. The -i switch (for “in-place”) says to collect the output from your script and overwrite the original contents of each file with it. The .bak parameter to the -i option tells perl to keep a backup of the original file in a file named after the original file name with .bak appended. For all of these bits, see perldoc perlrun.

If you want to do this within the context of a larger program, the easiest way to do it safely is to open the file twice, once for reading, and separately, with IO::AtomicFile, another time for writing. IO::AtomicFile will replace the original file only if it’s successfully closed.

like image 52
Aristotle Pagaltzis Avatar answered Nov 02 '22 04:11

Aristotle Pagaltzis


To filter out all the lines in a file that match a certain regex:

perl -n -i.orig -e 'print unless /^#/' file1 file2 file3

The '.orig' after the -i switch creates a backup of the file with the given extension (.orig). You can skip it if you don't need a backup (just use -i).

The -n switch causes perl to execute your instructions (-e ' ... ') for each line in the file. The line is stored in $_ (which is also the default argument for many instructions, in this case: print and regex matching).

Finally, the argument to the -e switch says "print the line unless it matches a # character at the start of the line.

PS. There is also a -p switch which behaves like -n, except the lines are always printed (good for searching and replacing)

like image 41
kixx Avatar answered Nov 02 '22 04:11

kixx