I have a MongoDB collection with > 1,000,000 documents.
I am performing an initial .find({ my_query })
to return a subset of those documents (~25,000 documents), which I then put into a list
object.
I am then looping over each of the objects, parsing some values from the returned document in the list, and performing an additional query using those parsed values via the code:
def _perform_queries(query):
conn = pymongo.MongoClient('mongodb://localhost:27017')
try:
coll = conn.databases['race_results']
races = coll.find(query).sort("date", -1)
except BaseException, err:
print('An error occured in runner query: %s\n' % err)
finally:
conn.close()
return races
In this case, my query
dictionary is:
{"$and": [{"opponents":
{"$elemMatch": {"$and": [
{"runner.name": name},
{"runner.jockey": jockey}
]}}},
{"summary.dist": "1"}
]}
Here is my issue. I have created an index on opponents.runner.name
and opponents.runner.jockey
. This makes the queries really-really fast. However, after about 10,000 queries in a row, pymongo is raising an exception:
pymongo.errors.AutoReconnect: [Errno 49] Can't assign requested address
When I remove the index, I don't see this error. But it takes about 0.5 seconds
per query, which is unusable in my case.
Does anyone know why the [Errno 49] can't assign requested address
could be occurring? I've seen a few other SO questions related to can't assign requested address
but not in relation to pymongo and there answers don't lead me anywhere.
UPDATE:
Following Serge's advice below, here is the output of ulimit -a
:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 2560
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 709
virtual memory (kbytes, -v) unlimited
My MongoDB is running on OS X Yosemite.
This is because you are using PyMongo incorrectly. You are creating a new MongoClient for each query, which requires you to open a new socket for each new query. This defeats PyMongo's connection pooling, and besides being extremely slow, it also means you open and close sockets faster than your TCP stack can keep up: you leave too many sockets in TIME_WAIT state so you eventually run out of ports.
Luckily, the fix is simple. Create one MongoClient and use it throughout:
conn = pymongo.MongoClient('mongodb://localhost:27017')
coll = conn.databases['race_results']
def _perform_queries(query):
return coll.find(query).sort("date", -1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With