Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

peewee and peewee-async: why is async slower

I am trying to wrap my head around Tornado and async connections to Postgresql. I found a library that can do this at http://peewee-async.readthedocs.io/en/latest/.

I devised a little test to compare traditional Peewee and Peewee-async, but somehow async works slower.

This is my app:

import peewee
import tornado.web
import logging
import asyncio
import peewee_async
import tornado.gen
import tornado.httpclient
from tornado.platform.asyncio import AsyncIOMainLoop

AsyncIOMainLoop().install()
app = tornado.web.Application(debug=True)
app.listen(port=8888)

# ===========
# Defining Async model
async_db = peewee_async.PooledPostgresqlDatabase(
    'reminderbot',
    user='reminderbot',
    password='reminderbot',
    host='localhost'
)
app.objects = peewee_async.Manager(async_db)
class AsyncHuman(peewee.Model):
    first_name = peewee.CharField()
    messenger_id = peewee.CharField()
    class Meta:
        database = async_db
        db_table = 'chats_human'


# ==========
# Defining Sync model
sync_db = peewee.PostgresqlDatabase(
    'reminderbot',
    user='reminderbot',
    password='reminderbot',
    host='localhost'
)
class SyncHuman(peewee.Model):
    first_name = peewee.CharField()
    messenger_id = peewee.CharField()
    class Meta:
        database = sync_db
        db_table = 'chats_human'

# defining two handlers - async and sync
class AsyncHandler(tornado.web.RequestHandler):

    async def get(self):
        """
        An asynchronous way to create an object and return its ID
        """
        obj = await self.application.objects.create(
            AsyncHuman, messenger_id='12345')
        self.write(
            {'id': obj.id,
             'messenger_id': obj.messenger_id}
        )


class SyncHandler(tornado.web.RequestHandler):

    def get(self):
        """
        An traditional synchronous way
        """
        obj = SyncHuman.create(messenger_id='12345')
        self.write({
            'id': obj.id,
            'messenger_id': obj.messenger_id
        })


app.add_handlers('', [
    (r"/receive_async", AsyncHandler),
    (r"/receive_sync", SyncHandler),
])

# Run loop
loop = asyncio.get_event_loop()
try:
    loop.run_forever()
except KeyboardInterrupt:
    print(" server stopped")

and this is what I get from Apache Benchmark:

ab -n 100 -c 100 http://127.0.0.1:8888/receive_async

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    4   1.5      5       7
Processing:   621 1049 256.6   1054    1486
Waiting:      621 1048 256.6   1053    1485
Total:        628 1053 255.3   1058    1492

Percentage of the requests served within a certain time (ms)
  50%   1058
  66%   1196
  75%   1274
  80%   1324
  90%   1409
  95%   1452
  98%   1485
  99%   1492
 100%   1492 (longest request)




ab -n 100 -c 100 http://127.0.0.1:8888/receive_sync
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    5   1.9      5       8
Processing:     8  476 277.7    479    1052
Waiting:        7  476 277.7    478    1052
Total:         15  481 276.2    483    1060

Percentage of the requests served within a certain time (ms)
  50%    483
  66%    629
  75%    714
  80%    759
  90%    853
  95%    899
  98%   1051
  99%   1060
 100%   1060 (longest request)

why is sync faster? where is the bottleneck I'm missing?

like image 322
kurtgn Avatar asked Oct 01 '16 06:10

kurtgn


1 Answers

For a long explanation:

http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

For a short explanation: synchronous Python code is simple and mostly implemented in the standard library's socket module, which is pure C. Async Python code is more complex than synchronous code. Each request requires several executions of the main event loop code, which is written in Python (in the asyncio case here) and therefore has a lot of overhead compared to C code.

Benchmarks like yours show async's overhead dramatically, because there's no network latency between your application and your database, and you're doing a large number of very small database operations. Since every other aspect of the benchmark is fast, these many executions of the event loop logic add a large proportion of the total runtime.

Mike Bayer's argument, linked above, is that low-latency scenarios like this are typical for database applications, and therefore database operations shouldn't be run on the event loop.

Async is best for high-latency scenarios, like websockets and web crawlers, where the application spends most of its time waiting for the peer, rather than spending most of its time executing Python.

In conclusion: if your application has a good reason to be async (it deals with slow peers), having an async database driver is a good idea for the sake of consistent code, but expect some overhead.

If you don't need async for another reason, don't do async database calls, because they're a bit slower.

like image 63
A. Jesse Jiryu Davis Avatar answered Sep 18 '22 19:09

A. Jesse Jiryu Davis