New to python, competent in a few languages, but can't see a 'snazzy' way of doing the following. I'm sure it's screaming out for a regex, but any solution I can come up with (using regex groups and what not) becomes insane quite quickly.
So, I have a string with html-like tags that I want to replace with actual html tags.
For example:
Hello, my name is /bJane/b.
Should become:
Hello, my name is <b>Jane</b>.
It might be combo'd with [i]talic and [u]nderline as well:
/iHello/i, my /uname/u is /b/i/uJane/b/i/u.
Should become:
<i>Hello</i>, my <u>name</u> is <b><i><u>Jane</b></i></u>.
Obviously a straight str.replace won't work because every 2nd token needs to be preceeded with the forwardslash.
For clarity, if tokens are being combo'd, it's always first opened, first closed.
Many thanks!
PS: Before anybody gets excited, I know that this sort of thing should be done with CSS, blah, blah, blah, but I didn't write the software, I'm just reversing its output!
Maybe something like this can help :
import re
def text2html(text):
""" Convert a text in a certain format to html.
Examples:
>>> text2html('Hello, my name is /bJane/b')
'Hello, my name is <b>Jane</b>'
>>> text2html('/iHello/i, my /uname/u is /b/i/uJane/u/i/b')
'<i>Hello</i>, my <u>name</u> is <b><i><u>Jane</u></i></b>'
"""
elem = []
def to_tag(match_obj):
match = match_obj.group(0)
if match in elem:
elem.pop(elem.index(match))
return "</{0}>".format(match[1])
else:
elem.append(match)
return "<{0}>".format(match[1])
return re.sub(r'/.', to_tag, text)
if __name__ == "__main__":
import doctest
doctest.testmod()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With