If-Then-Else Conditionals in Regular Expressions. A special construct (? ifthen|else) allows you to create conditional regular expressions. If the if part evaluates to true, then the regex engine will attempt to match the then part.
Literal Characters and Sequences For instance, you might need to search for a dollar sign ("$") as part of a price list, or in a computer program as part of a variable name. Since the dollar sign is a metacharacter which means "end of line" in regex, you must escape it with a backslash to use it literally.
Sure. Just look them up as normal and check for matches. Note that re. match only produces a match if the expression is found at the beginning of the string.
if re.match(regex, content):
blah..
You could also use re.search
depending on how you want it to match.
if re.search(r'pattern', string):
Simple if-regex example:
if re.search(r'ing\b', "seeking a great perhaps"): # any words end with ing?
print("yes")
Complex if-regex example (pattern check, extract a substring, case insensitive):
match_object = re.search(r'^OUGHT (.*) BE$', "ought to be", flags=re.IGNORECASE)
if match_object:
assert "to" == match_object.group(1) # what's between ought and be?
Notes:
Use re.search()
not re.match. Match restricts to the start of strings, a confusing convention if you ask me. If you do want a string-starting match, use caret or \A
instead, re.search(r'^...', ...)
Use raw string syntax r'pattern'
for the first parameter. Otherwise you would need to double up backslashes, as in re.search('ing\\b', ...)
In these examples, '\\b'
or r'\b'
is a special sequence meaning word-boundary for regex purposes. Not to be confused with '\b'
or '\x08'
backspace.
re.search()
returns None
if it doesn't find anything, which is always falsy.
re.search()
returns a Match object if it finds anything, which is always truthy.
a group is what matched inside parentheses
group numbering starts at 1
Specs
Tutorial
The REPL makes it easy to learn APIs. Just run python
, create an object and then ask for help
:
$ python
>>> import re
>>> help(re.compile(r''))
at the command line shows, among other things:
search(...)
search(string[, pos[, endpos]])
--> match object orNone
. Scan through string looking for a match, and return a correspondingMatchObject
instance. ReturnNone
if no position in the string matches.
so you can do
regex = re.compile(regex_txt, re.IGNORECASE)
match = regex.search(content) # From your file reading code.
if match is not None:
# use match
Incidentally,
regex_txt = "facebook.com"
has a .
which matches any character, so re.compile("facebook.com").search("facebookkcom") is not None
is true because .
matches any character. Maybe
regex_txt = r"(?i)facebook\.com"
The \.
matches a literal "."
character instead of treating .
as a special regular expression operator.
The r"..."
bit means that the regular expression compiler gets the escape in \.
instead of the python parser interpreting it.
The (?i)
makes the regex case-insensitive like re.IGNORECASE
but self-contained.
First you compile the regex, then you have to use it with match
, find
, or some other method to actually run it against some input.
import os
import re
import shutil
def test():
os.chdir("C:/Users/David/Desktop/Test/MyFiles")
files = os.listdir(".")
os.mkdir("C:/Users/David/Desktop/Test/MyFiles2")
pattern = re.compile(regex_txt, re.IGNORECASE)
for x in (files):
with open((x), 'r') as input_file:
for line in input_file:
if pattern.search(line):
shutil.copy(x, "C:/Users/David/Desktop/Test/MyFiles2")
break
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With