Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Surprisingly slow insertion to mongodb capped collections

I'm using mongodb's capped collections + tailable cursors as a pubsub engine, using pymongo. (This approach is described here). My environment includes several topics, several publishers, and a few dozen subscribers.

This works great, except that every once in a while I encounter insert-operations which take surprisingly long time to complete.

My understanding is that mongodb can support insertion of >10K docs/sec, which is why I am surprised that a single insert operation can take 85ms. (I also observed >1sec a few times in my production environment, but couldn't reproduce it using this SSCCE).

To keep delays to a minimum, journaling is disabled, and I am using the lowest write-concern, which helped, but didn't solve the problem.

What causes this and how can I fix it?


Here is a SSCCE. I run the following script twice in parallel:

import time, pymongo, pandas as pd
DBNAME = 'test5'
COLLNAME = 'abc'
mongo = pymongo.MongoClient()
#mongo[DBNAME].drop_collection(COLLNAME)
if DBNAME not in mongo.database_names():
    coll = mongo[DBNAME].create_collection(COLLNAME,
                                    capped = True, size = 50 * 1024 * 1024, max = 1024 * 1024,
                                    fsync = False,  # no delays please (FYI, journaling is disabled)
                                    #autoIndexId = False,
                                    )

tarr = []
for _ in range(1000):
    msg = dict.fromkeys([ 'f%d' % j for j in range(30) ], 'foo')
    t1 = time.time()
    coll.insert(msg)  # <-- INSERT() HERE
    tarr.append(time.time() - t1)
    time.sleep(1./100)  # 100 docs/sec. surely not too much load...

print pd.Series(tarr).describe()

Ouput:

RUN1:
count    1000.000000
mean        0.001555
std         0.002712
min         0.000654
25%         0.001021
50%         0.001084
75%         0.001151
max         0.065006  <-- 65ms!! 
dtype: float64

RUN2:
count    1000.000000
mean        0.001733
std         0.004013
min         0.000717
25%         0.001043
50%         0.001069
75%         0.001120
max         0.084803  <-- 84ms!! 
dtype: float64

Index information:

mongo[DBNAME][COLNAME].index_information()
> {u'_id_': {u'key': [(u'_id', 1)], u'ns': u'xxxxx', u'v': 1}}

mongodb info:

db version v3.2.0-rc0
git version: bf28bd20fa507c4d8cc5919dfbbe87b7750ae8b0
OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
allocator: tcmalloc
modules: none
build environment:
    distmod: ubuntu1404
    distarch: x86_64
    target_arch: x86_64

"storageEngine" : { "name" : "mmapv1", "supportsCommittedReads" : false }

journaling DISABLED
pymongo version = 3.0.3

Output of iostat -xtm 5 during the experiment:

*** BEFORE ***
11/08/2015 09:05:02 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.11    0.95    0.49    0.10    0.00   92.35

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    10.20    0.00    6.80     0.00     0.08    24.71     0.14   20.12    0.00   20.12   2.82   1.92
dm-0              0.00     0.00    0.00   17.00     0.00     0.08     9.88     0.24   13.84    0.00   13.84   1.13   1.92
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    7.80     0.00     0.03     8.41     0.04    4.92    0.00    4.92   2.05   1.60
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00


*** DURING ***
11/08/2015 09:05:07 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.25    0.26    0.55    0.26    0.00   89.67

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    13.60    0.00   40.60     0.00     0.59    29.52     4.84  119.15    0.00  119.15   1.60   6.48
dm-0              0.00     0.00    0.00   54.20     0.00     0.59    22.11     5.00   92.18    0.00   92.18   1.20   6.48
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    4.20     0.00     0.02     9.14     0.01    2.29    0.00    2.29   2.29   0.96
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

11/08/2015 09:05:12 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.02    0.26    0.41    0.44    0.00   90.88

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    19.00    0.00  109.80     0.00     0.63    11.82    22.95  209.01    0.00  209.01   1.84  20.24
dm-0              0.00     0.00    0.00  128.80     0.00     0.63    10.07    25.55  198.37    0.00  198.37   1.57  20.24
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    4.40     0.00     0.02     8.00     0.02    4.73    0.00    4.73   2.18   0.96
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

11/08/2015 09:05:17 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.81    0.27    0.39    0.16    0.00   93.37

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     8.40    0.00   24.80     0.00     0.29    24.19     0.26   10.55    0.00   10.55   1.94   4.80
dm-0              0.00     0.00    0.00   33.20     0.00     0.29    18.07     0.31    9.35    0.00    9.35   1.45   4.80
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    3.40     0.00     0.01     8.94     0.01    2.35    0.00    2.35   2.35   0.80
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00


*** AFTER ***
11/08/2015 09:05:22 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.85    0.40    0.46    0.00    0.00   93.28

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     3.80    0.00    0.40     0.00     0.02    84.00     0.00    6.00    0.00    6.00   6.00   0.24
dm-0              0.00     0.00    0.00    4.20     0.00     0.02     8.00     0.03    7.81    0.00    7.81   0.57   0.24
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    1.60     0.00     0.01    11.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

UPDATE: we switched mongodb to run in-memory (along the lines of what's described here). The machine has enough memory to sustain it. Surprisingly, it didn't help! We're still seeing the same "spikes" in insertion times. Something weird is happening. I have no idea what's causing it.

UPDATE: we also upgraded to mongodb V3.2, in-memory. Same results.

like image 221
shx2 Avatar asked Nov 04 '15 11:11

shx2


People also ask

How do I update a capped collection in MongoDB?

You can convert a non-capped collection to a capped collection with the convertToCapped command: db. runCommand({"convertToCapped": "mycoll", size: 100000}); The size parameter specifies the size of the capped collection in bytes.

What is a capped collection in MongoDB?

Capped collection in MongoDB supports high-throughput operations used for insertion and retrieval of documents based on the order of insertion. Capped collection working is similar to circular buffers, i.e., there is a fixed space allocated for the capped collection.

What are the characteristics of capped collection in MongoDB?

Capped collections restrict updates to the documents if the update results in increased document size. Since capped collections store documents in the order of the disk storage, it ensures that the document size does not increase the size allocated on the disk.

How do I delete a capped collection in MongoDB?

You cannot delete documents from a capped collection. It can only be deleted automatically upon insertion of new documents when the allocated size to the collection has been exhausted. After reading the documents from a capped collection, MongoDB returns the same document in the order which they were present on disk.


1 Answers

Looks like the problem is that you're only using fully non-blocking writes when creating the collection (calling create_collection), not when inserting.

You can disable write acknowledgement at the MongoClient level as:

mongo = pymongo.MongoClient(w=0)

Or for just the insert calls as:

coll.insert(msg, w=0)

Either way, the max time drops to 0.001 when I try this locally on a normal disk-backed 3.2 MongoDB instance using WiredTiger storage.

like image 111
JohnnyHK Avatar answered Sep 20 '22 17:09

JohnnyHK