Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyMongo find query returns empty/partial cursor when running in a Django+uWsgi project

We developed a REST API using Django & mongoDB (PyMongo driver). The problem is that, on some requests to the API endpoints, PyMongo cursor returns a partial response which contains less documents than it should (but it’s a completely valid JSON document).

Let me explain it with an example of one of our views:

def get_data(key):
    return collection.find({'key': key}, limit=24)

def my_view(request):
    key = request.POST.get('key')
    query = get_data(key)
    res = [app for app in query]
    return JsonResponse({'list': res})

We're sure that there is more than 8000 documents matching the query, but in some calls we get less than 24 results (even zero). The first problem we've investigated was that we had more than one MongoClient definition in our code. By resolving this, the number of occurrences of the problem decreased, but we still had it in a lot of calls.

After all of these investigations, we've designed a test in which we made 16 asynchronous requests at the same time to the server. With this approach, we could reproduce the problem. On each of these 16 requests, 6-8 of them had partial results. After running this test we reduced uWsgi’s number of processes to 6 and restarted the server. All results were good but after applying another heavy load on the server, the problem began again. At this point, we restarted uwsgi service and again everything was OK. With this last experiment we have a clue now that when the uwsgi service starts running, everything is working correctly but after a period of time and heavy load, the server begins to return partial or empty results again. The latest investigation we had was to run the API using python manage.py with DEBUG=False, and we had the problem again after a period of time in this situation.

We can't figure out what the problem is and how to solve it. One reason that we can think of is that Django closes pymongo’s connections before completion. Because the returned result is a valid JSON.

Our stack is:

  • nginx (with no cache enabled)
  • uWsgi
  • MemCached (disabled during debugging procedure)
  • Django (v1.8 on python 3)
  • PyMongo (v3.0.3)

Your help is really appreciated.

Update:

Mongo version:

db version v3.0.7
git version: 6ce7cbe8c6b899552dadd907604559806aa2e9bd
  • We are running single mongod instance. No sharding/replicating.
  • We are creating connection using this snippet:

    con = MongoClient('localhost', 27017)

Update 2

Subject thread in Pymongo issue tracker.

like image 599
Shahinism Avatar asked Oct 20 '15 10:10

Shahinism


1 Answers

Pymongo cursors are not thread safe elements. So using them like what I did in a multi-threaded environment will cause what I've described in question. On the other hand Python's list operations are mostly thread safe, and changing snippet like this will solve the problem:

def get_data(key):
    return list(collection.find({'key': key}, limit=24))

def my_view(request):
    key = request.POST.get('key')
    query = get_data(key)
    res = [app for app in query]
    return JsonResponse({'list': res})
like image 160
Shahinism Avatar answered Nov 01 '22 12:11

Shahinism