Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Boto Dynamodb very slow performance for small record set retrieval on range keys

I am testing dynamodb via boto and have found it to be surprisingly slow in retrieving data sets based on hashkey, rangekey condition queries. I have seen some discussion about the oddity that causes ssl (is_secure) to perform about 6x faster then non-ssl and I can confirm that finding. But even using ssl I am seeing 1-2 seconds to retrieve 300 records using a hashkey/range key condition on a fairly small data set (less then 1K records).

Running profilehooks profiler I see a lot of extraneous time spent in ssl.py to the order of 20617 ncalls to retrieve the 300 records. It seems like even at 10 calls per record it's still 6x more then I would expect. This is on a medium instance-- though the same results occur on a micro instance. 500 Reads/sec 1000 writes/sec provisioning with no throttles logged.

I have looked at doing a batch request but the inability to use range key conditions eliminates that option for me.

Any ideas on where I'm loosing time would be greatly appreciated!!

  144244 function calls in 2.083 CPU seconds

Ordered by: cumulative time, internal time, call count

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.001    0.001    2.083    2.083 eventstream.py:427(session_range)
  107    0.006    0.000    2.081    0.019 dynamoDB.py:36(rangeQ)
  408    0.003    0.000    2.073    0.005 layer2.py:493(query)
  107    0.001    0.000    2.046    0.019 layer1.py:435(query)
  107    0.002    0.000    2.040    0.019 layer1.py:119(make_request)
  107    0.006    0.000    1.988    0.019 connection.py:699(_mexe)
  107    0.001    0.000    1.916    0.018 httplib.py:956(getresponse)
  107    0.002    0.000    1.913    0.018 httplib.py:384(begin)
  662    0.049    0.000    1.888    0.003 socket.py:403(readline)
20617    0.040    0.000    1.824    0.000 ssl.py:209(recv)
20617    0.036    0.000    1.785    0.000 ssl.py:130(read)
20617    1.748    0.000    1.748    0.000 {built-in method read}
  107    0.002    0.000    1.738    0.016 httplib.py:347(_read_status)
  107    0.001    0.000    0.170    0.002 mimetools.py:24(__init__)
  107    0.000    0.000    0.165    0.002 rfc822.py:88(__init__)
  107    0.007    0.000    0.165    0.002 httplib.py:230(readheaders)
  107    0.001    0.000    0.031    0.000 __init__.py:332(loads)
  107    0.001    0.000    0.028    0.000 decoder.py:397(decode)
  107    0.008    0.000    0.026    0.000 decoder.py:408(raw_decode)
  107    0.001    0.000    0.026    0.000 httplib.py:910(request)
  107    0.003    0.000    0.026    0.000 httplib.py:922(_send_request)
  107    0.001    0.000    0.025    0.000 connection.py:350(authorize)
  107    0.004    0.000    0.024    0.000 auth.py:239(add_auth)
 3719    0.011    0.000    0.019    0.000 layer2.py:31(item_object_hook)
  301    0.010    0.000    0.018    0.000 item.py:38(__init__)
22330    0.015    0.000    0.015    0.000 {method 'append' of 'list' objects}
  107    0.001    0.000    0.012    0.000 httplib.py:513(read)
  214    0.001    0.000    0.011    0.000 httplib.py:735(send)
  856    0.002    0.000    0.010    0.000 __init__.py:1034(debug)
  214    0.001    0.000    0.009    0.000 ssl.py:194(sendall)
  107    0.000    0.000    0.008    0.000 httplib.py:900(endheaders)
  107    0.001    0.000    0.008    0.000 httplib.py:772(_send_output)
  107    0.001    0.000    0.008    0.000 auth.py:223(string_to_sign)
  856    0.002    0.000    0.008    0.000 __init__.py:1244(isEnabledFor)
  137    0.001    0.000    0.008    0.000 httplib.py:603(_safe_read)
  214    0.001    0.000    0.007    0.000 ssl.py:166(send)
  214    0.007    0.000    0.007    0.000 {built-in method write}
 3311    0.006    0.000    0.006    0.000 item.py:186(__setitem__)
  107    0.001    0.000    0.006    0.000 auth.py:95(sign_string)
  137    0.001    0.000    0.006    0.000 socket.py:333(read)
like image 979
jaredmsaul Avatar asked Apr 12 '12 12:04

jaredmsaul


1 Answers

This isn't a complete answer but I thought it was worth posting it at this time.

I've heard reports like this from a couple of people over the last few weeks. I was able to reproduce the anomaly of HTTPS being considerably faster than HTTP but wasn't able to track it down. It seemed like that problem was unique to Python/boto but it turns out the same issue was found on C#/.Net and investigating that it was found that the underlying problem was the use of the Nagle's algorithm in the Python and .Net libraries. In .Net, it's easy to turn this off but it's not as easy in Python, unfortunately.

To test this, I wrote a simple script that performed 1000 GetItem requests in a loop. The item that was being fetch was very small, well under 1K. Running this on Python 2.6.7 on an m1.medium instance in the us-east-1 region produced these results:

>>> http_data = speed_test(False, 1000)
dynamoDB_speed_test - RUNTIME = 53.120193
Throttling exceptions: 0
>>> https_data = speed_test(True, 1000)
dynamoDB_speed_test - RUNTIME = 8.167652
Throttling exceptions: 0

Note that there is sufficient provisioned capacity in the table to avoid any throttling from the service and the unexpected gap between HTTP and HTTPS is clear.

I next ran the same test in Python 2.7.2:

>>> http_data = speed_test(False, 1000)
dynamoDB_speed_test - RUNTIME = 5.668544
Throttling exceptions: 0
>>> https_data = speed_test(True, 1000)
dynamoDB_speed_test - RUNTIME = 7.425210
Throttling exceptions: 0

So, 2.7 seems to have fixed this issue. I then applied a simple patch to httplib.py in 2.6.7. The patch simply sets the TCP_NO_DELAY property of the socket associated with the HTTPConnection object, like this:

self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

I then re-ran the test on 2.6.7:

>>> http_data = speed_test(False, 1000)
dynamoDB_speed_test - RUNTIME = 5.914109
Throttling exceptions: 0
>>> https_data = speed_test(True, 1000)
dynamoDB_speed_test - RUNTIME = 5.137570
Throttling exceptions: 0

Even better although still an expectedly faster time with HTTPS than HTTP. It's hard to know whether that difference is significant or not.

So, I'm looking for ways to programmatically configure the socket for HTTPConnection objects to have TCP_NO_DELAY configured correctly. It's not easy to get at that in the httplib.py. My best advice for the moment is to use Python 2.7, if possible.

like image 163
garnaat Avatar answered Oct 07 '22 13:10

garnaat