Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expressions matching within set

Tags:

python

regex

While testing on http://gskinner.com/RegExr/ (online regex tester), the regex [jpg|bmp] returns results when either jpg or bmp exist, however, when I run this regex in python, it only return j or b. How do I make the regex take the whole word "jpg" or "bmp" inside the set ? This may have been asked before however I was not sure how to structure question to find the answer. Thanks !!!

Here is the whole regex if it helps

"http://www\S*(?i)\\.(jpg|bmp|png|gif|img|jng|jpeg|jpe|gif|giff)"

Its just basically to look for pictures in a url

like image 567
Trent Avatar asked Aug 15 '11 10:08

Trent


People also ask

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Which regex matches only a whitespace character in Python?

\s | Matches whitespace characters, which include the \t , \n , \r , and space characters. \S | Matches non-whitespace characters.


2 Answers

Use (jpg|bmp) instead of square brackets.

Square brackets mean - match a character from the set in the square brackets.

Edit - you might want something like that: [^ ].*?(jpg|bmp) or [^ ].*?\.(jpg|bmp)

like image 90
MByD Avatar answered Sep 23 '22 02:09

MByD


When you are using [] your are creating a character class that contains all characters between the brackets.

So your are not matching for jpg or bmp you are matching for either a j or a p or a g or a | ...

You should add an anchor for the end of the string to your regex

http://www\S*(?i)\\.(jpg|bmp|png|gif|img|jng|jpeg|jpe|gif|giff)$
          ^      ^^

if you need double escaping then every where in your pattern

http://www\\S*(?i)\\.(jpg|bmp|png|gif|img|jng|jpeg|jpe|gif|giff)$

to ensure that it checks for the file ending at the very end of the string.

like image 38
stema Avatar answered Sep 19 '22 02:09

stema