Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing from Mongo old documents by id

Tags:

shell

mongodb

In my current project we use Mongo for storing a lot of documents (approximately 100Bln). How do I remove a half of oldest documents using field _id, because if I use indexed field "timestamp" this operation will be completed after ~3 years with current speed.

like image 988
marco_manti Avatar asked Apr 19 '12 11:04

marco_manti


People also ask

How do I remove a property from a document in MongoDB?

In MongoDB, you can use the $unset field update operator to completely remove a field from a document. The $unset operator is designed specifically to delete a field and its value from the document.

How do I remove all records from a collection in MongoDB?

To delete all documents in a collection, pass an empty document ( {} ). Optional. To limit the deletion to just one document, set to true . Omit to use the default value of false and delete all documents matching the deletion criteria.

Which command will remove all documents in a collection with field age set to 10?

remove() The remove() method removes documents from the database. It can remove one or all documents from the collection that matches the given query expression.


1 Answers

Here is a link to a MongoDB-User Google Groups post that discusses generating ObjectIds based on time stamps: http://groups.google.com/group/mongodb-user/browse_thread/thread/262223bb0bd52a83/3fd9b01d0ad2c41b

From the post: Extracting the time stamp from Mongo ObjectIds is explained in the Mongo Document "Optimizing Object IDs" http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Extractinsertiontimesfromidratherthanhavingaseparatetimestampfield.

Taken from the example in the post, ObjectIds may be created from the time in seconds in Unix time:

> now = new Date()
ISODate("2012-04-19T19:01:58.841Z")
> ms = now.getTime()
1334862118841
> sec = Math.floor(ms/1000)
1334862118
> hex = sec.toString(16)
4f906126
> id_string = hex + "0000000000000000"
4f9061260000000000000000
> my_id = ObjectId(id_string)
ObjectId("4f9061260000000000000000")

Using the above formula, you can create an ObjectID from any date, and query for documents with lesser ObjectIds.

Going forward, if your application will be saving data based on time and deleting data once it reaches a certain age, you may find it preferable to store your documents in separate collections; one for each day, week, or whatever time frame makes the most sense for your application. Dropping an entire collection requires a lot less overhead than removing individual documents, because it can be done with a single operation. db.<collection>.remove({query}) will perform a write operation for each document returned, which as you have observed may be prohibitively slow for a large number of documents.

like image 187
Marc Avatar answered Oct 03 '22 11:10

Marc