I am using mongo for storing log files. Both mongoDB and mysql are running on the same machine, virtualizing mongo env is not an option. I am afraid I will soon run into perf issues as the logs table grows very fast. Is there a way to limit resident memory for mongo so that it won't eat all available memory and excessively slow down the mysql server?
DB machine: Debian 'lenny' 5
Other solutions (please comment):
As we need all historical data, we can not use capped collections, but I am also considering using a cron script that dumps and deletes old data
Should I also consider using smaller keys, as suggested on other forums?
MongoDB, in its default configuration, will use will use the larger of either 256 MB or ½ of (ram – 1 GB) for its cache size. You can limit the MongoDB cache size by adding the cacheSizeGB argument to the /etc/mongod. conf configuration file, as shown below.
The maximum size an individual document can be in MongoDB is 16MB with a nested depth of 100 levels. Edit: There is no max size for an individual MongoDB database.
The Limit() Method To limit the records in MongoDB, you need to use limit() method. The method accepts one number type argument, which is the number of documents that you want to be displayed.
The MongoDB statefulset has in production cluster 16 GB as memory limit, but is consuming more, 11 GB, and 20 GB in pre-production (Usually consuming 6-7 GB, which is OK).
Hey Vlad, you have a couple of simple strategies here regarding logs.
The first thing to know is that Mongo can generally handle lots of successive inserts without a lot of RAM. The reason for this is simple, you only insert or update recent stuff. So the index size grows, but the data will be constantly paged out.
Put another way, you can break out the RAM usage into two major parts: index & data.
If you're running typical logging, the data portion is constantly being flushed away, so only the index really stays in RAM.
The second thing to know is that you can mitigate the index issue by putting logs into smaller buckets. Think of it this way. If you collect all of the logs into a date-stamped collection (call it logs20101206
), then you can also control the size of the index in RAM.
As you roll over days, the old index will flush from RAM and it won't be accessed again, so it will simply go away.
but I am also considering using a cron script that dumps and deletes old data
This method of logging by days also helps delete old data. In three months when you're done with the data you simply do db.logs20101206.drop()
and the collection instantly goes away. Note that you don't reclaim disk space (it's all pre-allocated), but new data will fill up the empty spot.
Should I also consider using smaller keys, as suggested on other forums?
Yes.
In fact, I have it built into my data objects. So I access data using logs.action
or logs->action
, but underneath, the data is actually saved to logs.a
. It's really easy to spend more space on "fields" than on "values", so it's worth shrinking the "fields" and trying to abstract it away elsewhere.
For version 3.2+, which uses wiredTiger engine, the option --wiredTigerCacheSizeGB
is relevant to the question. You can set it if you know what you are exactly doing. I don't know if it's best practice, just read from the document and raise it here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With