In my current project we use Mongo for storing a lot of documents (approximately 100Bln). How do I remove a half of oldest documents using field _id, because if I use indexed field "timestamp" this operation will be completed after ~3 years with current speed.
In MongoDB, you can use the $unset field update operator to completely remove a field from a document. The $unset operator is designed specifically to delete a field and its value from the document.
To delete all documents in a collection, pass an empty document ( {} ). Optional. To limit the deletion to just one document, set to true . Omit to use the default value of false and delete all documents matching the deletion criteria.
remove() The remove() method removes documents from the database. It can remove one or all documents from the collection that matches the given query expression.
Here is a link to a MongoDB-User Google Groups post that discusses generating ObjectIds based on time stamps: http://groups.google.com/group/mongodb-user/browse_thread/thread/262223bb0bd52a83/3fd9b01d0ad2c41b
From the post: Extracting the time stamp from Mongo ObjectIds is explained in the Mongo Document "Optimizing Object IDs" http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Extractinsertiontimesfromidratherthanhavingaseparatetimestampfield.
Taken from the example in the post, ObjectIds may be created from the time in seconds in Unix time:
> now = new Date()
ISODate("2012-04-19T19:01:58.841Z")
> ms = now.getTime()
1334862118841
> sec = Math.floor(ms/1000)
1334862118
> hex = sec.toString(16)
4f906126
> id_string = hex + "0000000000000000"
4f9061260000000000000000
> my_id = ObjectId(id_string)
ObjectId("4f9061260000000000000000")
Using the above formula, you can create an ObjectID from any date, and query for documents with lesser ObjectIds.
Going forward, if your application will be saving data based on time and deleting data once it reaches a certain age, you may find it preferable to store your documents in separate collections; one for each day, week, or whatever time frame makes the most sense for your application. Dropping an entire collection requires a lot less overhead than removing individual documents, because it can be done with a single operation. db.<collection>.remove({query})
will perform a write operation for each document returned, which as you have observed may be prohibitively slow for a large number of documents.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With