Currently I have a mongo document that looks like this:
{'_id': id, 'title': title, 'date': date}
What I'm trying is to search within this document by title, in the database I have like 5ks items which is not much, but my file has 1 million of titles to search.
I have ensure the title as index within the collection, but still the performance time is quite slow (about 40 seconds per 1000 titles, something obvious as I'm doing a query per title), here is my code so far:
Work repository creation:
class WorkRepository(GenericRepository, Repository):
def __init__(self, url_root):
super(WorkRepository, self).__init__(url_root, 'works')
self._db[self.collection].ensure_index('title')
The entry of the program (is a REST api):
start = time.clock()
for work in json_works: #1000 titles per request
result = work_repository.find_works_by_title(work['title'])
if result:
works[work['id']] = result
end = time.clock()
print end-start
return json_encoder(request, works)
and find_works_by_title code:
def find_works_by_title(self, work_title):
works = list(self._db[self.collection].find({'title': work_title}))
return works
I'm new to mongo and probably I've made some mistake, any recommendation?
You're making one call to the DB for each of your titles. The roundtrip is going to significantly slow the process down (the program and the DB will spend most of their time doing network communications instead of actually working).
Try the following (adapt it to your program's structure, of course):
# Build a list of the 1000 titles you're searching for.
titles = [w["title"] for w in json_works]
# Make exactly one call to the DB, asking for all of the matching documents.
return collection.find({"title": {"$in": titles}})
Further reference on how the $in
operator works: http://docs.mongodb.org/manual/reference/operator/query/in/
If after that your queries are still slow, use explain
on the find
call's return value (more info here: http://docs.mongodb.org/manual/reference/method/cursor.explain/) and check that the query is, in fact, using an index. If it isn't, find out why.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With