When using the HTMLParser
class in Python, is it possible to abort processing within a handle_*
function? Early in the processing, I get all the data I need, so it seems like a waste to continue processing. There's an example below of extracting the meta description for a document.
from HTMLParser import HTMLParser
class MyParser(HTMLParser):
def handle_start(self, tag, attrs):
in_meta = False
if tag == 'meta':
for attr in attrs:
if attr[0].lower() == 'name' and attr[1].lower() == 'description':
in_meta = True
if attr[0].lower() == 'content':
print(attr[1])
# Would like to tell the parser to stop now,
# since I have all the data that I need
You can raise an exception and wrap your .feed()
call in a try block.
You can also call self.reset()
when you decide, that you are done (I have not actually tried it, but according to documentation "Reset the instance. Loses all unprocessed data.", - this is precisely what you need).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With