Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

matching all characters in any order in regex

Tags:

python

regex

I'm a regex newbie, but I understand how to match any characters in a regex query in order (ex. [abc] will match any of a, b or c. Also, I believe "abc" will match abc exactly).

However, how do I construct a regex query that will match all the characters abc in any order? So for example, I want it to match "cab" or "bracket". I'm using Python as my scripting language (not sure if this matters or not).

like image 453
steve8918 Avatar asked Nov 14 '11 14:11

steve8918


People also ask

How do you match a character sequence in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.


2 Answers

In Python, I wouldn't use a regualar expression for this purpose, but rather a set:

>>> chars = set("abc")
>>> chars.issubset("bracket")
True
>>> chars.issubset("fish")
False
>>> chars.issubset("bad")
False

Regular expressions are useful, but there are situations where different tools are more appropriate.

like image 144
Sven Marnach Avatar answered Oct 04 '22 03:10

Sven Marnach


This can be done with lookahead assertions:

^(?=.*a)(?=.*b)(?=.*c)

matches if your string contains at least one occurrence of a, b and c.

But as you can see, that's not really what regexes are good at.

I would have done:

if all(char in mystr for char in "abc"):
    # do something

Checking for speed:

>>> timeit.timeit(stmt='chars.issubset("bracket");chars.issubset("notinhere")',
... setup='chars=set("abc")')
1.3560583674019995
>>> timeit.timeit(stmt='all(char in "bracket" for char in s);all(char in "notinhere" for char in s)', 
... setup='s="abc"')
1.4581878714681409
>>> timeit.timeit(stmt='r.match("bracket"); r.match("notinhere")', 
... setup='import re; r=re.compile("(?=.*a)(?=.*b)(?=.*c)")')
1.0582279123082117

Hey, look, the regex wins! This even holds true for longer search strings:

>>> timeit.timeit(stmt='chars.issubset("bracketed");chars.issubset("notinhere")', 
... setup='chars=set("abcde")')
1.4316702294817105
>>> timeit.timeit(stmt='all(char in "bracketed" for char in s);all(char in "notinhere" for char in s)', 
... setup='s="abcde"')
1.6696223364866682
>>> timeit.timeit(stmt='r.match("bracketed"); r.match("notinhere")', 
... setup='import re; r=re.compile("(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?:.*e)")')
1.1809254199004044
like image 33
Tim Pietzcker Avatar answered Oct 04 '22 04:10

Tim Pietzcker