I am using python lxml library to parse html pages:
import lxml.html
# this might run indefinitely
page = lxml.html.parse('http://stackoverflow.com/')
Is there any way to set timeout for parsing?
It looks to be using urllib.urlopen as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler.
import socket
timeout = 10
socket.setdefaulttimeout(timeout)
Of course this is a quick-and-dirty solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With