I have a big chunk of text that I'm checking for a specific pattern, which looks essentially like this:
unique_options_search = new Set([
"updates_EO_LTB",
"us_history",
"uslegacy",
etc., etc., etc.
]);
$input.typeahead({
source: [...unique_options_search],
autoSelect: false,
afterSelect: function(value)
My text variable is named 'html_page'
and my start and end points look like this:
start = "new Set(["
end = "]);"
I thought I could find what I want with this one-liner:
r = re.findall("start(.+?)end",html_page,re.MULTILINE)
However, it's not returning anything at all. What is wrong here? I saw other examples online that worked fine.
There are multiple problems here.
"start(.+?)end"
in Python is a string which describes regex which literally matches start
, then something, and then literally matches end
. Variables start
and end
do not matter here at all.
You've probably meant to write start + "(.+?)" + end
here instead..
in Python does not match newlines. re.MULTILINE
does not matter here, it only changes behavior of ^
and $
(see docs). You should use re.DOTALL
instead (see docs).start
and end
include characters with special meaning in regex (e.g. (
and [
). You have to make sure they're not treated specially. You can either escape them manually with the right number of \
or simply delegate that work to re.escape
to get regular expression which literally matches what you need.Combining all that together:
import re
html_page = """
unique_options_search = new Set([
"oecd_updates_EO_LTB",
"us_history",
"us_legacy",
etc., etc., etc.
]);
$input.typeahead({
source: [...unique_options_search],
autoSelect: false,
afterSelect: function(value)
"""
start = "new Set(["
end = "]);"
# r = re.findall("start(.+?)end",html_page,re.MULTILINE) # Old version
r = re.findall(re.escape(start) + "(.+?)" + re.escape(end), html_page, re.DOTALL) # New version
print(r)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With