I have a file that contains this:
<html>
<head>
<title> Hello! - {{ today }}</title>
</head>
<body>
{{ runner_up }}
avasd
{{ blabla }}
sdvas
{{ oooo }}
</body>
</html>
What is the best or most Pythonic way to extract the {{today}}
, {{runner_up}}
, etc.?
I know it can be done with splits/regular expressions, but I wondered if there were another way.
PS: consider the data loaded in a variable called thedata
.
Edit: I think that the HTML example was bad, because it directed some commenters to BeautifulSoup. So, here is a new input data:
Fix grammatical or {{spelling}} errors.
Clarify meaning without changing it.
Correct minor {{mistakes}}.
Add related resources or links.
Always respect the original {{author}}.
Output:
spelling
mistakes
author
Mmkay, well here's a generator solution that seems to work well for me. You can also provide different open and close tags if you like.
def get_tags(s, open_delim ='{{',
close_delim ='}}' ):
while True:
# Search for the next two delimiters in the source text
start = s.find(open_delim)
end = s.find(close_delim)
# We found a non-empty match
if -1 < start < end:
# Skip the length of the open delimiter
start += len(open_delim)
# Spit out the tag
yield s[start:end].strip()
# Truncate string to start from last match
s = s[end+len(close_delim):]
else:
return
Run against your target input like so:
# prints: today, runner_up, blabla, oooo
for tag in get_tags(html):
print tag
Edit: it also works against your new example :). In my obviously quick testing, it also seemed to handle malformed tags in a reasonable way, though I make no guarantees of its robustness!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With