Abort HTMLParser processing in Python

Question

When using the HTMLParser class in Python, is it possible to abort processing within a handle_* function? Early in the processing, I get all the data I need, so it seems like a waste to continue processing. There's an example below of extracting the meta description for a document.

from HTMLParser import HTMLParser

class MyParser(HTMLParser):

    def handle_start(self, tag, attrs):
        in_meta = False
        if tag == 'meta':
          for attr in attrs:
              if attr[0].lower() == 'name' and attr[1].lower() == 'description':
                  in_meta = True
              if attr[0].lower() == 'content':
                  print(attr[1])
                  # Would like to tell the parser to stop now,
                  # since I have all the data that I need

shylent · Accepted Answer

You can raise an exception and wrap your .feed() call in a try block.

You can also call self.reset() when you decide, that you are done (I have not actually tried it, but according to documentation "Reset the instance. Loses all unprocessed data.", - this is precisely what you need).

Abort HTMLParser processing in Python

Tags:

python

html

parsing

Michael Mior

1 Answers

shylent

Recent Activity

Donate For Us

Abort HTMLParser processing in Python

Tags:

python

html

parsing

Michael Mior

1 Answers

shylent

Related questions

Recent Activity

Donate For Us