Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex negating metacharacters

Python metacharacter negation.

After scouring the net and writing a few different syntaxes I'm out of ideas.

Trying to rename some files. They have a year in the title e.g. [2002]. Some don't have the brackets, which I want to rectify.

So I'm trying to find a regex (that I can compile preferably) that in my mind looks something like (^[\d4^]) because I want the set of 4 numbers that don't have square brackets around them. I'm using the brackets in the hope of binding this so that I can then rename using something like [\1].

like image 846
Dee Avatar asked Jan 19 '23 04:01

Dee


1 Answers

If you want to check for things around a pattern you can use lookahead and lookbehind assertions. These don't form part of the match but say what you expect to find (or not find) around it.

As we don't want brackets we'll need use a negative lookbehind and lookahead.

A negative lookahead looks like this (?!...) where it matches if ... does not come next. Similarly a negative lookbehind looks like this (?<!...) and matches if ... does not come before.

Our example is make slightly more complicated because we're using [ and ] which themselves have meaning in regular expressions so we have to escape them with \.

So we can build up a pattern as follows:

  • A negative lookbehind for [ - (?<!\[)
  • Four digits - \d{4}
  • A negative lookahead for ] - (?!\])

This gives us the following Python code:

>>> import re
>>> r = re.compile("(?<!\[)\d{4}(?!\])")
>>> r.match(" 2011 ")
>>> r.search(" 2011 ")
<_sre.SRE_Match object at 0x10884de00>
>>> r.search("[2011]")

To rename you can use the re.sub function or the sub function on your compiled pattern. To make it work you'll need to add an extra set of brackets around the year to mark it as a group.

Also, when specifying your replacement you refer to the group as \1 and so you have to escape the \ or use a raw string.

>>> r = re.compile("(?<!\[)(\d{4})(?!\])")
>>> name = "2011 - This Year"
>>> r.sub(r"[\1]",name)
'[2011] - This Year'
like image 159
Dave Webb Avatar answered Jan 28 '23 16:01

Dave Webb