Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python `socket.getaddrinfo` taking 5 seconds about 0.1% of requests

Running Python on a Django project which communicates with various web-services, we have an issue that occasionally requests are taking around 5 seconds instead of their usual < 100 ms.

I've narrowed this down to time taken in the socket.getaddrinfo function - this is being called by requests when we connect to external services, but it also appears to effect the default Django connection to the Postgres database box in the cluster. When we restart uwsgi after a deployment the first requests that come in will take 5 seconds to send a response. I also believe that our celery tasks are taking 5 seconds on a regular basis, but I've not added statsd timer tracking to them yet.

I've written some code to reproduce the issue:

import socket
import timeit

def single_dns_lookup():
    start = timeit.default_timer()
    socket.getaddrinfo('stackoverflow.com', 443)
    end = timeit.default_timer()
    return int(end - start)

timings = {}

for _ in range(0, 10000):
    time = single_dns_lookup()
    try:
        timings[time] += 1
    except KeyError:
        timings[time] = 1

print timings

Typical results are {0: 9921, 5: 79}

My colleague has already pointed to potential issues around ipv6 lookup times and has added this to the /etc/gai.conf:

precedence ::ffff:0:0/96  100

This has definitely improved lookups from non-Python programs such as curl which we use, but not from Python itself. The server boxes are running Ubuntu 16.04.3 LTS and I'm able to reproduce this on a vanilla VM with Python 2.

What steps can I take to improve the performance of all Python lookups so that they can take < 1s?

like image 839
jamesc Avatar asked Oct 19 '17 18:10

jamesc


2 Answers

5s is a default timeout to DNS lookup.

You can lower that.

Your real problem is probably (silent) UDP packets drop on the network though.

Edit: Experiment with resolution over TCP. Never done that. Might help you.

like image 88
Krzysztof Szularz Avatar answered Sep 27 '22 21:09

Krzysztof Szularz


There are two things that can be done. One is that you don't query the IPV6 address, this can be done by monkey patching getaddrinfo

orig_getaddrinfo = socket.getaddrinfo

def _getaddrinfo(host, port, family=0, type=0, proto=0, flags=0):
    return orig_getaddrinfo(host, port, socket.AF_INET, type, proto, flags)

socket.getaddrinfo = _getaddrinfo

Next you can also use a ttl based cache to cache the result. You can use cachepy package for the same.

from cachetools import cached
import socket
import timeit
from cachepy import *
# or from cachepy import Cache

cache_with_ttl = Cache(ttl=600) # ttl given in seconds

orig_getaddrinfo = socket.getaddrinfo

# @cached(cache={})
@cache_with_ttl
def _getaddrinfo(host, port, family=0, type=0, proto=0, flags=0):
    return orig_getaddrinfo(host, port, socket.AF_INET, type, proto, flags)

socket.getaddrinfo = _getaddrinfo

def single_dns_lookup():
    start = timeit.default_timer()
    socket.getaddrinfo('stackoverflow.com', 443)
    end = timeit.default_timer()
    return int(end - start)

timings = {}

for _ in range(0, 10000):
    time = single_dns_lookup()
    try:
        timings[time] += 1
    except KeyError:
        timings[time] = 1

print (timings)
like image 30
Tarun Lalwani Avatar answered Sep 27 '22 22:09

Tarun Lalwani