Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex pattern that does not match certain extensions?

I have this pattern written

^.*\.(?!jpg$|png$).+$

However there is a problem - this pattern matches file.name.jpg (2 dots)

It works correctly (does not match) on filename.jpg. I am trying to figure out how to make it not match ANY .jpg files even if the file's name has 2 or more dots in it. I tried using a look behind but python complains about not using a fixed width (which I'm not exactly sure what that means, but the file name will be variable length.)

like image 254
paradigm111 Avatar asked Apr 07 '12 05:04

paradigm111


People also ask

What does regex (? S match?

Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.

How do you say does not contain in regex?

In order to match a line that does not contain something, use negative lookahead (described in Recipe 2.16). Notice that in this regular expression, a negative lookahead and a dot are repeated together using a noncapturing group.

What does \\ mean in regex?

\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \. html?\ ' .


2 Answers

This should work: ^.*\.(?!jpg$|png$)[^.]+$

like image 52
bereal Avatar answered Nov 02 '22 00:11

bereal


If you only care that the string doesn't end with .jpg or .png, you can use this:

^.+$(?<!\.jpg)(?<!\.png)

The ^.+ isn't strictly necessary, but depending on how the JSON parser is coded you might need to force the regex to consume the whole string. If you're using the regex for other validations as well, you might want something more elaborate, like:

^\w+(?:\.\w+)+$(?<!\.jpg)(?<!\.png)

You probably tried to use (?<!\.jpg|\.png), which wouldn't work because Python's regex flavor is one of the most restrictive when it comes to lookbehinds. PHP and Ruby 1.9+ would accept it because each of the alternatives has a fixed length. They don't even have to be the same length; (?<!\.jpg|\.jpeg|\.png) would work, too. Just don't try to factor out the dot, as in (?<!\.(?:jpg|jpeg|png)); the alternation has to be at the top level of the lookbehind.

Java would accept the factored-out version because it does a little more work at compile time to determine the maximum number of characters the lookbehind might need to match. The lookbehind expression needs to be fairly simple though, and it can't use the + or * quantifiers. Finally, the .NET and JGSoft flavors place no restrictions at all on lookbehinds. But Python makes a very simple-minded attempt to figure out the exact number of characters the lookbehind needs to match, generating that cryptic error message when it fails.

like image 26
Alan Moore Avatar answered Nov 01 '22 23:11

Alan Moore