Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression using negative lookbehind not working in Notepad++

I have a source file with literally hundreds of occurrences of strings flecha.jpg and flecha1.jpg, but I need to find occurrences of any other .jpg image (i.e. casa.jpg, moto.jpg, whatever)

I have tried using a regular expression with negative lookbehind, like this:

(?<!flecha|flecha1).jpg

but it doesn't work! Notepad++ simply says that it is an invalid regular expression.

I have tried the regex elsewhere and it works, here is an example so I guess it is a problem with NPP's handling of regexes or with the syntax of lookbehinds/lookaheads.

So how could I achieve the same regex result in NPP?

If useful, I am using Notepad++ version 6.3 Unicode

As an extra, if you are so kind, what would be the syntax to achieve the same thing but with optional numbers (in this case only '1') as a suffix of my string? (even if it doesn't work in NPP, just to know)...

I tried (?<!flecha[1]?).jpg but it doesn't work. It should work the same as the other regex, see here (RegExr)

like image 546
DiegoDD Avatar asked Jun 24 '13 23:06

DiegoDD


People also ask

Can I use negative Lookbehind?

The positive lookbehind ( (? <= ) ) and negative lookbehind ( (? <! ) ) zero-width assertions in JavaScript regular expressions can be used to ensure a pattern is preceded by another pattern.

What is negative Lookbehind regex?

In negative lookbehind the regex engine first finds a match for an item after that it traces back and tries to match a given item which is just before the main match. In case of a successful traceback match the match is a failure, otherwise it is a success.

What is Lookbehind in regex?

Introduction to the JavaScript regex lookbehind In regular expressions, a lookbehind matches an element if there is another specific element before it. A lookbehind has the following syntax: (?<=Y)X. In this syntax, the pattern match X if there is Y before it.

Does JavaScript support negative Lookbehind?

Since 2018, Lookbehind Assertions are part of the ECMAScript language specification. As Javascript supports negative lookahead, one way to do it is: reverse the input string. match with a reversed regex.


2 Answers

Notepad++ seems to not have implemented variable-length look-behinds (this happens with some tools). A workaround is to use more than one fixed-length look-behind:

(?<!flecha)(?<!flecha1)\.jpg

As you can check, the matches are the same. But this works with npp.

Notice I escaped the ., since you are trying to match extensions, what you want is the literal .. The way you had, it was a wildcard - could be any character.

About the extra question, unfortunately, as we can't have variable-length look-behinds, it is not possible to have optional suffixes (numbers) without having multiple look-behinds.

like image 195
acdcjunior Avatar answered Sep 17 '22 12:09

acdcjunior


Solving the problem of the variable-length-negative-lookbehind limitation in Notepad++

Given here are several strategies for working around this limitation in Notepad++ (or any regex engine with the same limitation)

Defining the problem

Notepad++ does not support the use of variable-length negative lookbehind assertions, and it would be nice to have some workarounds. Let's consider the example in the original question, but assume we want to avoid occurrences of files named flecha with any number of digits after flecha, and with any characters before flecha. In that case, a regex utilizing a variable-length negative lookbehind would look like (?<!flecha[0-9]*)\.jpg.

Strings we don't want to match in this example

  • flecha.jpg
  • flecha1.jpg
  • flecha00501275696.jpg
  • aflecha.jpg
  • img_flecha9.jpg
  • abcflecha556677.jpg

The Strategies

  1. Inserting Temporary Markers

    Begin by performing a find-and-replace on the instances that you want to avoid working with - in our case, instances of flecha[0-9]*\.jpg. Insert a special marker to form a pattern that doesn't appear anywhere else. For this example, we will insert an extra . before .jpg, assuming that ..jpg doesn't appear elsewhere. So we do:

    Find: (flecha[0-9]*)(\.jpg)

    Replace with: $1.$2

    Now you can search your document for all the other .jpg filenames with a simple regex like \w+\.jpg or (?<!\.)\.jpg and do what you want with them. When you're done, do a final find-and-replace operation where you replace all instances of ..jpg with .jpg, to remove the temporary marker.

  2. Using a negative lookahead assertion

    A negative lookahead assertion can be used to make sure that you're not matching the undesired file names:

    (?<!\S)(?!\S*flecha\d*\.jpg)\S+\.jpg

    Breaking it down:

    • (?<!\S) ensures that your match begins at the start of a file name, and not in the middle, by asserting that your match is not preceded by a non-whitespace character.
    • (?!\S*flecha\d*\.jpg) ensures that whatever is matched does not contain the pattern we want to avoid
    • \S+\.jpg is what actually gets matched -- a string of non-whitespace characters followed by .jpg.
  3. Using multiple fixed-length negative lookbehinds

    This is a quick (but not-so-elegant) solution for situations where the pattern you don't want to match has a small number of possible lengths.

    For example, if we know that flecha is only followed by up to three digits, our regex could be:

    (?<!flecha)(?<!flecha[0-9])(?<!flecha[0-9][0-9])(?<!flecha[0-9][0-9][0-9])\.jpg

like image 33
Josh Withee Avatar answered Sep 19 '22 12:09

Josh Withee