I need to transform some text files into HTML code. I'm stuck in transforming a list into an HTML unordered list. Example source:
some text in the document
* item 1
* item 2
* item 3
some other text
The output should be:
some text in the document
<ul>
<li>item 1</li>
<li>item 2</li>
<li>item 3</li>
</ul>
some other text
Currently, I have this:
r = re.compile(r'\*(.*)\n')
r.sub('<li>\1</li>', the_text_document)
which creates an HTML list without < ul >
tags.
How can I identify the first and last items and surround them with < ul >
tags?
Or use BeautifulSoup
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
edit
I apparently have to give you some hint on how to read documentation.
And many more things
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
Don't stop reading after the first sentence... The last one is pretty important and what's in the middle to.
In other word, you can create an empty document... let say:
soup = BeautifulSoup("<div></div>")
document = soup.div
then you read each lines of you text.. and then do that whenever you have text.
document.append(line)
if the line starts with a `*``
ul = document.new_tag('ul')
document.append(ul)
document = ul
then push all the li
on the document... and once you end up reading *
, just pop the parent so the document gets back to the div. And keep doing that... you can even do it recursively to insert ul
into ul
s.
Once you parsed everything... you can do
str(document)
or
document.prettify()
Edit
just realized that you weren't editing the html but a unformatted text.. You could try using markdown then.
http://daringfireball.net/projects/markdown/
You could just process you data line by line .. this quick and dirty solution below could probably be tidied up, but for your data it does the trick.
with open('data.txt') as inf:
star_count = 0
for line in inf:
line = line.strip()
if not line.startswith('*'):
if star_count == 1:
print'</ul>'
print line
else:
if star_count == 0:
print '<ul>'
star_count = 1
print ' <li>%s</li>' %line.split('*')[1].strip()
yields:
some text in the document
<ul>
<li>item 1</li>
<li>item 2</li>
<li>item 3</li>
</ul>
some other text
Depending on how complex your data, or if you have repeating unumbered lists etc this will require modification and you may want to look for a more general solution, or modify this starter code to fill your needs, only you can decide.
Update:
Edited <li> .. </li>
print line to get rid of *
that were previously left.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With