I'm a regex newbie, but I understand how to match any characters in a regex query in order (ex. [abc] will match any of a, b or c. Also, I believe "abc" will match abc exactly).
However, how do I construct a regex query that will match all the characters abc in any order? So for example, I want it to match "cab" or "bracket". I'm using Python as my scripting language (not sure if this matters or not).
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.
In Python, I wouldn't use a regualar expression for this purpose, but rather a set:
>>> chars = set("abc")
>>> chars.issubset("bracket")
True
>>> chars.issubset("fish")
False
>>> chars.issubset("bad")
False
Regular expressions are useful, but there are situations where different tools are more appropriate.
This can be done with lookahead assertions:
^(?=.*a)(?=.*b)(?=.*c)
matches if your string contains at least one occurrence of a
, b
and c
.
But as you can see, that's not really what regexes are good at.
I would have done:
if all(char in mystr for char in "abc"):
# do something
Checking for speed:
>>> timeit.timeit(stmt='chars.issubset("bracket");chars.issubset("notinhere")',
... setup='chars=set("abc")')
1.3560583674019995
>>> timeit.timeit(stmt='all(char in "bracket" for char in s);all(char in "notinhere" for char in s)',
... setup='s="abc"')
1.4581878714681409
>>> timeit.timeit(stmt='r.match("bracket"); r.match("notinhere")',
... setup='import re; r=re.compile("(?=.*a)(?=.*b)(?=.*c)")')
1.0582279123082117
Hey, look, the regex wins! This even holds true for longer search strings:
>>> timeit.timeit(stmt='chars.issubset("bracketed");chars.issubset("notinhere")',
... setup='chars=set("abcde")')
1.4316702294817105
>>> timeit.timeit(stmt='all(char in "bracketed" for char in s);all(char in "notinhere" for char in s)',
... setup='s="abcde"')
1.6696223364866682
>>> timeit.timeit(stmt='r.match("bracketed"); r.match("notinhere")',
... setup='import re; r=re.compile("(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?:.*e)")')
1.1809254199004044
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With