I have a big chunk of text that I'm checking for a specific pattern, which looks essentially like this:
unique_options_search = new Set([
"updates_EO_LTB",
"us_history",
"uslegacy",
etc., etc., etc.
]);
$input.typeahead({
source: [...unique_options_search],
autoSelect: false,
afterSelect: function(value)
My text variable is named 'html_page' and my start and end points look like this:
start = "new Set(["
end = "]);"
I thought I could find what I want with this one-liner:
r = re.findall("start(.+?)end",html_page,re.MULTILINE)
However, it's not returning anything at all. What is wrong here? I saw other examples online that worked fine.
There are multiple problems here.
"start(.+?)end" in Python is a string which describes regex which literally matches start, then something, and then literally matches end. Variables start and end do not matter here at all.
You've probably meant to write start + "(.+?)" + end here instead.. in Python does not match newlines. re.MULTILINE does not matter here, it only changes behavior of ^ and $ (see docs). You should use re.DOTALL instead (see docs).start and end include characters with special meaning in regex (e.g. ( and [). You have to make sure they're not treated specially. You can either escape them manually with the right number of \ or simply delegate that work to re.escape to get regular expression which literally matches what you need.Combining all that together:
import re
html_page = """
unique_options_search = new Set([
"oecd_updates_EO_LTB",
"us_history",
"us_legacy",
etc., etc., etc.
]);
$input.typeahead({
source: [...unique_options_search],
autoSelect: false,
afterSelect: function(value)
"""
start = "new Set(["
end = "]);"
# r = re.findall("start(.+?)end",html_page,re.MULTILINE) # Old version
r = re.findall(re.escape(start) + "(.+?)" + re.escape(end), html_page, re.DOTALL) # New version
print(r)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With