Can anyone please suggest how to handle the document size exceeds 16MB error while inserting the document into the collection on MongoDB. I got some solutions like GridFS. By using GridsFS can handle this problem but I need a solution without using GridFS. Is there any way to make the document smaller or split into subdocuments. If yes how can we achieve?
from pymongo import MongoClient
conn = MongoClient("mongodb://sample_mongo:27017")
db_conn = conn["test"]
db_collection = db_conn["sample"]
# the size of record is 23MB
record = { \
"name": "drugs",
"collection_id": 23,
"timestamp": 1515065002,
"tokens": [], # contains list of strings
"tokens_missing": [], # contains list of strings
"token_mapping": {} # Dictionary contains transformed tokens
}
db_collection.insert(record, check_keys=False)
I got the error DocumentTooLarge: BSON document too large. In MongoDB, the maximum BSON document size is 16 megabytes.
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 2501, in insert
check_keys, manipulate, write_concern)
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 575, in _insert
check_keys, manipulate, write_concern, op_id, bypass_doc_val)
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 556, in _insert_one
check_keys=check_keys)
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 482, in command
self._raise_connection_failure(error)
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 610, in _raise_connection_failure
raise error
DocumentTooLarge: BSON document too large (22451007 bytes) - the connected server supports BSON document sizes up to 16793598 bytes.
The maximum BSON document size is 16 megabytes. To store documents larger than the maximum size, MongoDB provides the GridFS API
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16 MB. GridFS stores the big sized document by dividing it into parts or chunks. Each chunk is stored in a seperate document. Default size of a GridFS chunk is 255 KB. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With