Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Asynchronous URLfetch when we don't care about the result? [Python]

In some code I'm writing for GAE I need to periodically perform a GET on a URL on another system, in essence 'pinging' it and I'm not terribly concerned if the request fails, times out or succeeds.

As I basically want to 'fire and forget' and not slow down my own code by waiting for the request, I'm using an asynchronous urlfetch, and not calling get_result().

In my log I get a warning:

Found 1 RPC request(s) without matching response (presumably due to timeouts or other errors)

Am I missing an obviously better way to do this? A Task Queue or Deferred Task seems (to me) like overkill in this instance.

Any input would appreciated.

like image 526
John Carter Avatar asked Mar 23 '11 20:03

John Carter


2 Answers

A task queue task is your best option here. The message you're seeing in the log indicates that the request is waiting for your URLFetch to complete before returning, so this doesn't help. You say a task is 'overkill', but really, they're very lightweight, and definitely the best way to do this. Deferred will even allow you to just defer the fetch call directly, rather than having to write a function to call.

like image 98
Nick Johnson Avatar answered Nov 13 '22 22:11

Nick Johnson


How long does it take for the async_url_fetch to complete and how long does it take to provide your response?

Here is a possible approach to leverage the way the api works in python.

Some points to consider.

  • Many webservers and reverse proxies will not cancel a request once it has been started. So if your remote server you are pinging cues the request but takes a long time to service it, use a deadline on your create_rpc(deadline=X) such that X will return due to timeout. The ping may still succeed. This technique works against appengine itself as well.

  • GAE RPCs

    • RPCs after being cued via make_call/make_fetch_call are actually only dispatched once one of them is waited on.
    • Also any just finished rpc will have its callback called when the currently waited on one finishes.
    • You can create an async_urlfetch rpc and enqueue it using make_fetch_call as early as possible in handling your request, don't wait on it yet.
    • Do the actual page serving work, like memcache/datastore calls to get the work going. The first call to one of this will perform a wait which will dispatch your async_urlfetch.
    • If the urlfetch completes during this other activity the callback on the urlfetch will be called, allowing you to do handle the result.
    • If you do call get_result() it will block on wait() till the deadline or it returns unless the result is ready.

To recap.

Prepare the long running url_fetch with a reasonable deadline and callback. Enqueue it using make_fetch_call. Do the work you wanted to for the page. Return the page regardless of wether the url_fetch completed or deadlined and without waiting for it.

The underlying RPC layer in GAE is all asynchronous, there seems to be a more sophisticated way to choose what you wish to wait on in the works.

These examples use sleep and a url_fetch to a second instance of the same app.

Example of wait() dispatching rpc work:

class AsyncHandler(RequestHandler):

    def get(self, sleepy=0.0):
        _log.info("create rpc")
        rpc = create_rpc()
        _log.info("make fetch call")
        # url will generate a 404
        make_fetch_call(rpc, url="http://<my_app>.appspot.com/hereiam")
        _log.info("sleep for %r", sleepy)
        sleep(sleepy)
        _log.info("wait")
        rpc.wait()
        _log.info("get_result")
        rpc.get_result()
        _log.info("return")
        return "<BODY><H1>Holla %r</H1></BODY>" % sleepy

Wait called after sleeping for 4 seconds shows dispatch of

2011-03-23 17:08:35.673 /delay/4.0 200 4093ms 23cpu_ms 0kb Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.151 Safari/534.16,gzip(gfe)
I 2011-03-23 17:08:31.583 create rpc
I 2011-03-23 17:08:31.583 make fetch call
I 2011-03-23 17:08:31.585 sleep for 4.0
I 2011-03-23 17:08:35.585 wait
I 2011-03-23 17:08:35.663 get_result
I 2011-03-23 17:08:35.663 return
I 2011-03-23 17:08:35.669 Saved; key: __appstats__:011500, part: 48 bytes, full: 4351 bytes, overhead: 0.000 + 0.006; link: http://<myapp>.appspot.com/_ah/stats/details?tim
2011-03-23 17:08:35.636 /hereiam 404 9ms 0cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine; appid: s~<myapp>),gzip(gfe)

Async dispatched call.

E 2011-03-23 17:08:35.632 404: Not Found Traceback (most recent call last): File "distlib/tipfy/__init__.py", line 430, in wsgi_app rv = self.dispatch(request) File "di
I 2011-03-23 17:08:35.634 Saved; key: __appstats__:015600, part: 27 bytes, full: 836 bytes, overhead: 0.000 + 0.002; link: http://<myapp>.appspot.com/_ah/stats/details?time

Showing using a memcache RPC's wait to kick off the work.

class AsyncHandler(RequestHandler):

    def get(self, sleepy=0.0):
        _log.info("create rpc")
        rpc = create_rpc()
        _log.info("make fetch call")
        make_fetch_call(rpc, url="http://<myapp>.appspot.com/hereiam")
        _log.info("sleep for %r", sleepy)
        sleep(sleepy)
        _log.info("memcache's wait")
        memcache.get('foo')
        _log.info("sleep again")
        sleep(sleepy)
        _log.info("return")
        return "<BODY><H1>Holla %r</H1></BODY>" % sleepy

Appengine Prod Log:

2011-03-23 17:27:47.389 /delay/2.0 200 4018ms 23cpu_ms 0kb Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.151 Safari/534.16,gzip(gfe)
I 2011-03-23 17:27:43.374 create rpc
I 2011-03-23 17:27:43.375 make fetch call
I 2011-03-23 17:27:43.377 sleep for 2.0
I 2011-03-23 17:27:45.378 memcache's wait
I 2011-03-23 17:27:45.382 sleep again
I 2011-03-23 17:27:47.382 return
W 2011-03-23 17:27:47.383 Found 1 RPC request(s) without matching response (presumably due to timeouts or other errors)
I 2011-03-23 17:27:47.386 Saved; key: __appstats__:063300, part: 66 bytes, full: 6869 bytes, overhead: 0.000 + 0.003; link: http://<myapp>.appspot.com/_ah/stats/details?tim
2011-03-23 17:27:45.452 /hereiam 404 10ms 0cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine; appid: s~<myapp>),gzip(gfe)

Async url fetch dispatched when memcache.get calls wait()

E 2011-03-23 17:27:45.446 404: Not Found Traceback (most recent call last): File "distlib/tipfy/__init__.py", line 430, in wsgi_app rv = self.dispatch(request) File "di
I 2011-03-23 17:27:45.449 Saved; key: __appstats__:065400, part: 27 bytes, full: 835 bytes, overhead: 0.000 + 0.002; link: http://<myapp>.appspot.com/_ah/stats/details?time
like image 38
kevpie Avatar answered Nov 13 '22 20:11

kevpie