Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating index takes very long time

Tags:

mongodb

I created a collection in MongoDB consisting of 11446615 documents.

Each document has the following form:

{ 
 "_id" : ObjectId("4e03dec7c3c365f574820835"), 
 "httpReferer" : "http://www.somewebsite.pl/art.php?id=13321&b=1", 
 "words" : ["SEX", "DRUGS", "ROCKNROLL", "WHATEVER"],     
 "howMany" : 3 
}

httpReferer: just an url

words: words parsed from the url above. Size of the list is between 15 and 90.

I am planning to use this database to obtain list of webpages which have similar content.

I 'll by querying this collection using words field so I created (or rather started creating) index on this field:

db.my_coll.ensureIndex({words: 1})

I started creating index about 3 hours ago and it doesn't seem like it could finish in another 3 hours.

How can I increase speed of indexing? Or maybe I should use completely another approach to this problem? Any ideas are welcome :)

like image 848
whysoserious Avatar asked Jun 24 '11 10:06

whysoserious


2 Answers

Background indexes also have some issues.

  1. If anything it should take longer due to the load on your server.
  2. If interrupted for some reason it will restart as a foreground build

If you have a replica set I prefer to do a "rolling index build".

  1. Take a secondary out of replica set
  2. Build index
  3. Insert secondary back into replica set

I think this is the cleanest solution.

like image 45
Dharshan Avatar answered Nov 16 '22 01:11

Dharshan


Nope, indexing is slow for large collections. You can create the indexing in the background as well:

db.my_coll.ensureIndex({words:1}, {background:true});

Creating the index in the background will be slower and result in a larger index. However, it won't be used until the indexing is complete, so in the meantime you'll be able to use the database normally and the indexing won't block.

like image 87
Andz Avatar answered Nov 16 '22 03:11

Andz