I would like to use Twisted non-blocking getPage method within a webapp, but it feels quite complicated to use such function compared to urlopen.
This is an example of what I'm trying to achive:
def web_request(request):
response = urllib.urlopen('http://www.example.org')
return HttpResponse(len(response.read()))
Is it so hard to have something similar with getPage?
The thing to realize about non-blocking operations (which you seem to explicitly want) is that you can't really write sequential code with them. The operations don't block because they don't wait for a result. They start the operation and return control to your function. So, getPage
doesn't return a file-like object you can read from like urllib.urlopen
does. And even if it did, you couldn't read from it until the data was available (or it would block.) And so you can't call len()
on it, since that needs to read all the data first (which would block.)
The way to deal with non-blocking operations in Twisted is through Deferreds
, which are objects for managing callbacks. getPage
returns a Deferred
, which means "you will get this result later". You can't do anything with the result until you get it, so you add callbacks to the Deferred
, and the Deferred
will call these callbacks when the result is available. That callback can then do what you want it to:
def web_request(request)
def callback(data):
HttpResponse(len(data))
d = getPage("http://www.example.org")
d.addCallback(callback)
return d
An additional problem with your example is that your web_request
function itself is blocking. What do you want to do while you wait for the result of getPage
to become available? Do something else within web_request
, or just wait? Or do you want to turn web_request
itself non-blocking? If so, how do you want to produce the result? (The obvious choice in Twisted is to return another Deferred
-- or even the same one as getPage
returns, as in the example above. This may not always be appropriate if you're writing code in another framework, though.)
There is a way to write sequential code using Deferreds
, although it's somewhat restrictive, harder to debug, and core Twisted people cry when you use it: twisted.internet.defer.inlineCallbacks
. It uses the new generator feature in Python 2.5 where you can send data into a generator, and the code would look somewhat like this:
@defer.inlineCallbacks
def web_request(request)
data = yield getPage("http://www.example.org")
HttpResponse(len(data))
Like the example that explicitly returned the d
Deferred, this'll only work if the caller expects web_request
to be non-blocking -- the defer.inlineCallbacks
decorator turns the generator into a function that returns a Deferred
.
I posted a response to a similar question recently that provides the minimal amount of code required to get the contents from a URL using getPage
. Here it is for completeness:
from twisted.web.client import getPage
from twisted.internet import reactor
url = 'http://aol.com'
def print_and_stop(output):
print output
if reactor.running:
reactor.stop()
if __name__ == '__main__':
print 'fetching', url
d = getPage(url)
d.addCallback(print_and_stop)
reactor.run()
Keep in mind that you'll probably need a more in-depth understanding of the reactor pattern used by Twisted to handle events (getPage
firing being an event in this instance).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With