I need to do a boolean function which returns true if a word is in the text of a HTML page and false if it's not.
I know that it's easy to do analysing all the page tree until finding the word with the lxml
library but I find inefficient to iterate through all the html blocks and find if the word is there.
Any suggestions for a faster algorithm (I need to do this search so many times)?
As long as you're not worried about accidentally finding the word in an element attribute or something (and if you are worried about that, parsing the HTML with something like lxml is kind of your only option), you can just treat the entire HTML document as a big string and search for your word in it:
def checkForWord():
r = requests.get("http://example.com/somepage.html")
return "myWord" in r.text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With