I would like to read a website asynchronously, which isnt possible with urllib as far as I know. Now I tried reading with with plain sockets, but HTTP is giving me hell. I run into all kind of funky encodings, for example transfer-encoding: chunked, have to parse all that stuff manually, and I feel like coding C, not python at the moment.
Isnt there a nicer way like URLLib, asynchronously? I dont really feel like re-implementing the whole HTTP specification, when it's all been done before.
Twisted isnt an option currently.
Greetings,
Tom
You can implement an asynchronous call yourself. For each call, start a new thread (or try to get one from a pool) and use a callback to process it.
You can do this very nicely with a decorator:
def threaded(callback=lambda *args, **kwargs: None, daemonic=False):
"""Decorate a function to run in its own thread and report the result
by calling callback with it."""
def innerDecorator(func):
def inner(*args, **kwargs):
target = lambda: callback(func(*args, **kwargs))
t = threading.Thread(target=target)
t.setDaemon(daemonic)
t.start()
return inner
return innerDecorator
@threaded()
def get_webpage(url):
data = urllib.urlopen(url).read()
print data
Have you looked at http://asynchttp.sourceforge.net/?
"Asynchronous HTTP Client for Python
The 'asynchttp'' module is a logical extension of the Python library 'asynchat' module which is built on the 'asyncore' and 'select' modules. Our goal is to provide the functionality of the excellent 'httplib' module without using blocking sockets."
The project's last commit was 2001-05-29, so it looks dead. But it might be of interest anyway.
Disclaimer: I have not used it myself.
Also, this blog post has some information on async HTTP.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With