I'm parsing a html document using HTMLParser and I want to print the contents between the start and end of a p tag
See my code snippet
def handle_starttag(self, tag, attrs):
if tag == 'p':
print "TODO: print the contents"
Based on what @tauran posted, you probably want to do something like this:
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def print_p_contents(self, html):
self.tag_stack = []
self.feed(html)
def handle_starttag(self, tag, attrs):
self.tag_stack.append(tag.lower())
def handle_endtag(self, tag):
self.tag_stack.pop()
def handle_data(self, data):
if self.tag_stack[-1] == 'p':
print data
p = MyHTMLParser()
p.print_p_contents('<p>test</p>')
Now, you might want to push all <p> contents into a list and return that as a result or something else like that.
TIL: when working with libraries like this, you need to think in stacks!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With