Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Twisted's getPage as urlopen?

I would like to use Twisted non-blocking getPage method within a webapp, but it feels quite complicated to use such function compared to urlopen.

This is an example of what I'm trying to achive:

def web_request(request):
   response = urllib.urlopen('http://www.example.org')
   return HttpResponse(len(response.read()))

Is it so hard to have something similar with getPage?

like image 225
RadiantHex Avatar asked Apr 27 '10 10:04

RadiantHex


2 Answers

The thing to realize about non-blocking operations (which you seem to explicitly want) is that you can't really write sequential code with them. The operations don't block because they don't wait for a result. They start the operation and return control to your function. So, getPage doesn't return a file-like object you can read from like urllib.urlopen does. And even if it did, you couldn't read from it until the data was available (or it would block.) And so you can't call len() on it, since that needs to read all the data first (which would block.)

The way to deal with non-blocking operations in Twisted is through Deferreds, which are objects for managing callbacks. getPage returns a Deferred, which means "you will get this result later". You can't do anything with the result until you get it, so you add callbacks to the Deferred, and the Deferred will call these callbacks when the result is available. That callback can then do what you want it to:

def web_request(request)
    def callback(data):
        HttpResponse(len(data))
    d = getPage("http://www.example.org")
    d.addCallback(callback)
    return d

An additional problem with your example is that your web_request function itself is blocking. What do you want to do while you wait for the result of getPage to become available? Do something else within web_request, or just wait? Or do you want to turn web_request itself non-blocking? If so, how do you want to produce the result? (The obvious choice in Twisted is to return another Deferred -- or even the same one as getPage returns, as in the example above. This may not always be appropriate if you're writing code in another framework, though.)

There is a way to write sequential code using Deferreds, although it's somewhat restrictive, harder to debug, and core Twisted people cry when you use it: twisted.internet.defer.inlineCallbacks. It uses the new generator feature in Python 2.5 where you can send data into a generator, and the code would look somewhat like this:

@defer.inlineCallbacks
def web_request(request)
    data = yield getPage("http://www.example.org")
    HttpResponse(len(data))

Like the example that explicitly returned the d Deferred, this'll only work if the caller expects web_request to be non-blocking -- the defer.inlineCallbacks decorator turns the generator into a function that returns a Deferred.

like image 152
Thomas Wouters Avatar answered Oct 14 '22 18:10

Thomas Wouters


I posted a response to a similar question recently that provides the minimal amount of code required to get the contents from a URL using getPage. Here it is for completeness:

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
    print output
    if reactor.running:
       reactor.stop()

if __name__ == '__main__':
    print 'fetching', url
    d = getPage(url)
    d.addCallback(print_and_stop)
    reactor.run()

Keep in mind that you'll probably need a more in-depth understanding of the reactor pattern used by Twisted to handle events (getPage firing being an event in this instance).

like image 4
jathanism Avatar answered Oct 14 '22 18:10

jathanism