Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does 24 MB of CSV data become 230 MB in MongoDB collection?

My Meteor app takes a CSV file, parses it with Baby Parse (Papa Parse for server) and inserts the data to a MongoDB collection.

Each CSV row is inserted as a document. 24 MB CSV file contains ~900,000 rows; hence, ~900,000 documents in the collection. Each document has 5 fields including the unique id of documents.

When I use dataSize() to get collection size, I receive the number 230172976; if I'm not mistaken, this number is in bytes; therefore it is 230 MB.

Why is this gigantic increase happening? How can I fix this?

like image 520
stackyname Avatar asked Dec 24 '15 15:12

stackyname


People also ask

How much memory does a MongoDB server use?

At 215 MB of data this server consistently used 11% of memory when hitting it with multiple searches. Then we added 3.5 GB of data at which point it locked up when it reached 92% memory usage. (This article is part of our MongoDB Guide.

What happens if you put too much data in MongoDB?

If you put too much data in your MongoDB database, it will run your server out of memory. It can do that quickly too, so quick that you will not even be able to shutdown the mongo db process as the bash shell will no longer respond.

Does MongoDB use WiredTiger?

mapped —MongoDB since version 3.2 does not do memory mapping of files anymore. That was used by the previous memory management module called MMAPv1. Now it uses WiredTiger by default. To check your file system cache run free -k to show available virtual memory in kilobytes.

How do I limit the size of MongoDB cache?

MongoDB, in its default configuration, will use will use the larger of either 256 MB or ½ of (ram – 1 GB) for its cache size. You can limit the MongoDB cache size by adding the cacheSizeGB argument to the /etc/mongod.conf configuration file, as shown below.


1 Answers

This is because the value returns by .dataSize() include the records padding. Also note that if your documents don't have the _id field it will be added and each _id field is 12-byte. You may want to read Record Allocation Strategies

How can I fix this:

Using the collMod command with the noPadding flag or the db.createCollection() method with the noPadding option. But you shouldn't do that because as mentioned in the documentation:

Only set noPadding to true for collections whose workloads have no update operations that cause documents to grow, such as for collections with workloads that are insert-only.

As Pete Garafano mentioned in the comment below, this is applicable for the MMAPv1 Storage Engine only; which is the default storage engine in MongoDB 3.0 and all previous versions.

MongoDB 3.2 use the WiredTiger Storage Engine and you will need to change the default storage engine in order to use that option in your configuration file or using the --storageEngine option.

like image 132
styvane Avatar answered Oct 05 '22 21:10

styvane