Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find word in HTML page fast algorithm

I need to do a boolean function which returns true if a word is in the text of a HTML page and false if it's not.

I know that it's easy to do analysing all the page tree until finding the word with the lxml library but I find inefficient to iterate through all the html blocks and find if the word is there.

Any suggestions for a faster algorithm (I need to do this search so many times)?

like image 951
arodriguezdonaire Avatar asked Oct 20 '22 05:10

arodriguezdonaire


1 Answers

As long as you're not worried about accidentally finding the word in an element attribute or something (and if you are worried about that, parsing the HTML with something like lxml is kind of your only option), you can just treat the entire HTML document as a big string and search for your word in it:

def checkForWord():
    r = requests.get("http://example.com/somepage.html")
    return "myWord" in r.text
like image 123
Hayden Schiff Avatar answered Oct 21 '22 23:10

Hayden Schiff