Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why callbacks are "ugly"? [closed]

Lately I've listened to a talk by Guido van Rossum, about async I/O in Python3. I was surprised by a notion of callbacks being "hated" by developers, supposedly for being ugly. I've also discovered a concept of a coroutine, and started reading a coroutine tutorial by David Beazley. So far, coroutines still look pretty esoteric to me - a way too obscure and hard to reason about, than those "hated" callbacks.

Now I'm trying to find out why some people consider callbacks ugly. True, with callbacks, the program no longer looks like a linear piece of code, executing a single algorithm. But, well, it is not - as soon as it has async I/O in it - and there's no good in pretending it is. Instead, I think about such a program as event-driven - you write it by defining how it reacts to relevant events.

Or there's something else about coroutines, which is considered bad, besides making programs "non-linear"?

like image 861
rincewind Avatar asked Aug 30 '13 22:08

rincewind


1 Answers

Consider this code for reading a protocol header:

def readn(sock, n):
    buf = ''
    while n > len(buf):
        newbuf = sock.recv(n - len(buf))
        if not newbuf:
            raise something
        buf += newbuf
    return buf

def readmsg(sock):
    msgtype = readn(sock, 4).decode('ascii')
    size = struct.unpack('!I', readn(sock, 4))
    data = readn(sock, size)
    return msgtype, size, data

Obviously, if you want to handle more than one user at a time, you can't loop over blocking recv calls like that. So what can you do?

If you use threads, you don't have to do anything to this code; just run each client on a separate thread, and everything is fine. It's like magic. The problem with threads is that you can't run 5000 of them at the same time without slowing your scheduler to a crawl, allocating so much stack space that you go into swap hell, etc. So, the question is, how do we get the magic of threads without the problems?

Implicit greenlets are one answer to the problem. Basically, you write threaded code, it's actually run by a cooperative scheduler which interrupts you code every time you make a blocking call. The problem is that this involves monkeypatching all the known blocking calls, and hoping no libraries you install add any new ones.

Coroutines are an answer to that problem. If you explicitly mark each blocking function call by dropping a yield from before it, nobody needs to monkeypatch anything. You do still need to have async-compatible functions to call, but it's no longer possible to block the whole server without expecting it, and it's much clearer from your code what's going on. The disadvantage is that the reactor code under the covers has to be more complicated… but that's something you write once (or, better, zero times, because it comes in a framework or the stdlib).

With callbacks, the code you write will ultimately do exactly the same thing as with coroutines, but the complexity is now inside your protocol code. You have to effectively turn the flow of control inside out. The most obvious translation is pretty horrible by comparison:

def readn(sock, n, callback):
    buf = ''
    def on_recv(newbuf):
        nonlocal buf, callback
        if not newbuf:
            callback(None, some error)
            return
        buf += newbuf
        if len(buf) == n:
            callback(buf)
        async_read(sock, n - len(buf), on_recv)
    async_read(sock, n, on_recv)

def readmsg(sock, callback):
    msgtype, size = None, None
    def on_recv_data(buf, err=None):
        nonlocal data
        if err: callback(None, err)
        callback(msgtype, size, buf)
    def on_recv_size(buf, err=None):
        nonlocal size
        if err: callback(None, err)
        size = struct.unpack('!I', buf)
        readn(sock, size, on_recv_data)            
    def on_recv_msgtype(buf, err=None):
        nonlocal msgtype
        if err: callback(None, err)
        msgtype = buf.decode('ascii')
        readn(sock, 4, on_recv_size)
    readn(sock, 4, on_recv_msgtype)

Now, obviously, in real life, anyone who writes the callback code that way should be shot; there are much better ways to organize it, like using Futures or Deferreds, using a class with methods instead of a bunch of local closures defined in reverse order with nonlocal statements, and so on.

But the point is, there is no way to write it in a way that looks even remotely like the synchronous version. The flow of control is inherently central, and the protocol logic is secondary. With coroutines, because the flow of control is always "backward", it isn't explicit in your code at all, and the protocol logic is all there is to read and write.


That being said, there are plenty of places where the best way to write something with callbacks is better than the coroutine (or synchronous) version, because the whole point of the code is chaining asynchronous events together.

If you read through the Twisted tutorial, you'll see that it's not that hard to make the two mechanisms play nicely together. If you write everything around Deferreds, you can freely use Deferred-composition functions, explicit callbacks, and @inlineCallbacks-style coroutines. In some parts of your code, the flow of control is important and the logic is trivial; in other parts, the logic is complex and you don't want it obscured by the flow of control. So, you can use whichever one makes sense in each case.


In fact, it's worth comparing generators-as-coroutines with generators-as-iterators. Consider:

def squares(n):
    for i in range(n):
        yield i*i

def squares(n):
    class Iterator:
        def __init__(self):
            self.i = 0
        def __iter__(self):
            return self
        def __next__(self):
            i, self.i = self.i, self.i+1
            return i*i
    return Iterator(n)

The first version hides a lot of "magic"—the state of the iterator between next calls isn't explicit anywhere; it's implicit in the local frame of the generator function. And every time you do a yield, the state of the entire program could have changed before the yield returns. And yet, the first version is obviously much clearer and simpler, because there's almost nothing to read except the actual logic of the operation of yielding N squares.

Obviously you wouldn't want to put all the state in every program you ever write into a generator. But refusing to use generators at all because they hide the state transitions would be like refusing to use a for loop because it hides the program-counter jumps. And it's exactly the same case with coroutines.

like image 172
abarnert Avatar answered Oct 24 '22 21:10

abarnert