Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast import into MongoDb

Tags:

c#

mongodb

gridfs

I have around 2 million strings with different lengths that I need to compress and put into MongoDb GridFS as files.

The strings are currently stored in MS SQL TEXT field of a table. I wrote a sample app to read each row, compress it and store it as a GridFS file.

There is one reader and a thread pool of 50 threads storing the results. It works but it is very slow (100 records per second on average).

I was wondering if there is any way for faster import into GridFS?

I'm using MongoDb 1.6 on Windows with MongoCSharp driver in C# and .NET.

like image 551
Khash Avatar asked Mar 21 '26 10:03

Khash


1 Answers

I think I found the issue inside MongoDb CSharp driver by profiling it while running a very simple app that puts 1000 strings into 1000 GridFS files.

It turns out that 97% of the time is spent on checking if a file with the same filename exists in the collection. I added an index on the filename field and it's now blazing fast!

The question for me is if the driver needs to keep the filename unique and does a check, why doesn't it add a unique index to it if that's missing? What's the reason behind that?

like image 141
Khash Avatar answered Mar 23 '26 00:03

Khash



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!