I created a collection in MongoDB consisting of 11446615 documents.
Each document has the following form:
{
"_id" : ObjectId("4e03dec7c3c365f574820835"),
"httpReferer" : "http://www.somewebsite.pl/art.php?id=13321&b=1",
"words" : ["SEX", "DRUGS", "ROCKNROLL", "WHATEVER"],
"howMany" : 3
}
httpReferer: just an url
words: words parsed from the url above. Size of the list is between 15 and 90.
I am planning to use this database to obtain list of webpages which have similar content.
I 'll by querying this collection using words field so I created (or rather started creating) index on this field:
db.my_coll.ensureIndex({words: 1})
I started creating index about 3 hours ago and it doesn't seem like it could finish in another 3 hours.
How can I increase speed of indexing? Or maybe I should use completely another approach to this problem? Any ideas are welcome :)
Background indexes also have some issues.
If you have a replica set I prefer to do a "rolling index build".
I think this is the cleanest solution.
Nope, indexing is slow for large collections. You can create the indexing in the background as well:
db.my_coll.ensureIndex({words:1}, {background:true});
Creating the index in the background will be slower and result in a larger index. However, it won't be used until the indexing is complete, so in the meantime you'll be able to use the database normally and the indexing won't block.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With