Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't get allowDiskUse:True to work with pymongo

Tags:

I'm running into the aggregation result exceeds maximum document size (16MB) error with mongodb aggregation using pymongo.

I was able to overcome it at first using the limit() option. However, at some point I got the

Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in." error. 

Ok, I'll use the {'allowDiskUse':True} option. This option works when I use it on the commandline, but when I tried to use in my python code

result = work1.aggregate(pipe, 'allowDiskUse:true') 

I get TypeError: aggregate() takes exactly 2 arguments (3 given) error. (that's in spite of the definition given at http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.aggregate: aggregate(pipeline, **kwargs)).

I tried to use runCommand, or rather it's pymongo equivalent:

db.command('aggregate','work1',pipe, {'allowDiskUse':True}) 

but now I'm back to the 'aggregation result exceeds maximum document size (16MB)' error

In case you need to know

pipe = [{'$project': {'_id': 0, 'summary.trigrams': 1}}, {'$unwind': '$summary'}, {'$unwind': '$summary.trigrams'}, {'$group': {'count': {'$sum': 1}, '_id': '$summary.trigrams'}}, {'$sort': {'count': -1}}, {'$limit': 10000}] 

Thank you

like image 396
David Makovoz Avatar asked Dec 03 '14 13:12

David Makovoz


People also ask

What is allowDiskUse true?

allowDiskUse() enables writing temporary files to disk. . allowDiskUse(true) enables writing temporary files to disk. .

What does PyMongo find_One return?

The find_One() method of pymongo is used to retrieve a single document based on your query, in case of no matches this method returns nothing and if you doesn't use any query it returns the first document of the collection.

What is PyMongo and what can you do with PyMongo?

In general, PyMongo provides a rich set of tools that you can use to communicate with a MongoDB server. It provides functionality to query, retrieve results, write and delete data, and run database commands.

What version of pymongo does NumPy run on?

PyMongo 3.10.1, and the free Atlas M0. It says version 4.2.10. I updated everything, which did indeed break Python - apparently NumPy is having a bad couple months - rolled back NumPy, and successfully ran it on PyMongo 3.11.1, with this slightly more informative error:

What is operationfailure error in pymongo?

pymongo.errors.OperationFailure: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in. #1138

Is allowdiskuse supported on the free tier?

Clearly states that allowDiskUse is not supported on free tier and also, as confirmed by the support team, neither on any of the shared memory plans. Sorry, something went wrong. For. Example Sorry, something went wrong.

Why is my Mongo query looping through the result set?

The code you posted for looping through the result set looks ok, so it might be the query itself. For some simple sanity check: is the query actually returns anything (e.g. try it out in the mongo shell), does the collection has any data in it, and please check if you’re connecting to the correct server


1 Answers

So, in order:

  • aggregate is a method. It takes 2 positional arguments (self, which is implicitly passed, and pipeline) and any number of keyword arguments (which must be passed as foo=bar -- if there's no = sign, it's not a keyword argument). This means you need to call result = work1.aggregate(pipe, allowDiskUse=True).

  • Your error about maximum document size is inherent to Mongo. Mongo can never return a document (or array thereof) larger than 16 megabytes. I can't tell you why because you have given us neither your data nor your code, but it probably means that the document you're building as an end result is too large. Try decreasing the $limit parameter, maybe? Start by setting it to 1, run a test, then increase it and look at how big the result gets when you do that.

like image 145
Max Noel Avatar answered Sep 22 '22 20:09

Max Noel