Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pymongo bulk inserts

I am trying to insert documents in bulk. But it doesn't insert more than 84 documents during bulk insert. Gives me this error:

in insert pymongo.errors.InvalidOperation: cannot do an empty bulk insert

Is it possible to do batch inserts, like inserting 50 documents per insert?

like image 779
blackmamba Avatar asked Feb 02 '14 19:02

blackmamba


3 Answers

Check out the documentation for bulk inserts in PyMongo. You just pass a list of dicts to insert(). If the list is empty, PyMongo raises an exception, as you've observed.

like image 110
A. Jesse Jiryu Davis Avatar answered Oct 17 '22 03:10

A. Jesse Jiryu Davis


Late to the game here but have had good success with the bulk operations described here (http://api.mongodb.com/python/current/examples/bulk.html). The insert_many() method already does the necessary chunking under the hood. My workflow involved one large 'bulk insert' and then many subsequent full collection updates. Using the bulk update process was many times faster then the looped single update. However the % increase in speed varied based on the size of the input (10, 100, 1000, 1

def unordered_bulk_write():
    bulk_op = collection.initialize_unordered_bulk_op()

    for primary_key in primary_key_list:
        bulk_op.find({'fubar_key': primary_key}).update({'$set': {'dopeness_factor': 'unlimited'}})

    try:
        bulk_op.execute()
    except Exception as e:
        print e, e.details

def single_update_write():
    for primary_key in primary_key_list:
        collection.update_one({'fubar_key': primary_key}, {'$set': 
        {'dopeness_factor': 'unlimited'}})

These methods were run in an ipy notebook with the %%timing magic and i got the following stats. Methods were called in a map over a given chunk of randomly selected primary keys with increasing chunk sizes.

WITH CHUNK_SIZE = 10
UNORDERED BULK WRITE = 1000 loops, best of 3: 871 µs per loop
SINGLE UPDATE ONE = 100 loops, best of 3: 2.47 ms per loop

WITH CHUNK_SIZE = 100
UNORDERED BULK WRITE = 100 loops, best of 3: 4.57 ms per loop
SINGLE UPDATE ONE = 10 loops, best of 3: 26.2 ms per loop

WITH CHUNK_SIZE = 1000
UNORDERED BULK WRITE = 10 loops, best of 3: 39 ms per loop
SINGLE UPDATE ONE = 1 loops, best of 3: 246 ms per loop

WITH CHUNK_SIZE = 10000
UNORDERED BULK WRITE = 1 loops, best of 3: 399 ms per loop
SINGLE UPDATE ONE = 1 loops, best of 3: 2.58 s per loop

WITH CHUNK_SIZE = 100000
UNORDERED BULK WRITE = 1 loops, best of 3: 4.34 s per loop
SINGLE UPDATE ONE = 1 loops, best of 3: 24.8 s per loop
like image 2
chill_turner Avatar answered Oct 17 '22 05:10

chill_turner


Already answered here: Mongodb bulk insert limit in Python

You don't really need to do bulk insert. Just do insert iteration, and Pymongo will be responsible for chunking the data into maximum byte size or collection the inserted data for some time until it reaches maximum byte size before bulk inserting it to the database.

Mongodb itself has a message size limit (maxMessageSizeBytes), that is equal to 48000000 bytes (maxBsonObjectSize * 3).

like image 1
Aminah Nuraini Avatar answered Oct 17 '22 05:10

Aminah Nuraini