Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete documents by query efficiently in mongo?

Tags:

mongodb

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):

for id in mycoll.find(query, fields={}):
  mycoll.remove(id)

This does not seem to be very efficient. Is there a better way?

EDIT

OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:

def reduce_duplicates(mydb, max_group_size):
  # 1. Count the group sizes
  res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
  # 2. For each entry from the filter scratch collection having count > max_group_size
  deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
  for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
    key = entry['_id']
    group_size = int(entry['value'])
    # 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
    for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
      mydb.static.remove(id)
  return res['counts']['input']

So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:

  1. MR the data to (key, count) pairs.
  2. Iterate over all the pairs with count > max_group_size
  3. Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
  4. Delete each and every found record.

As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.

Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:

mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])

This attempt fails miserably. Moreover, it seems to screw mongo.Observe:

C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database

Needless to say, that the foreach-found-remove approach works and yields the expected results.

Now, I hope I have provided enough context and (hopefully) have restored my lost honour.

like image 640
mark Avatar asked Apr 04 '12 15:04

mark


2 Answers

You can use a query to remove all matching documents

var query = {name: 'John'};
db.collection.remove(query);

Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.

Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.

like image 199
Sergio Tulentsev Avatar answered Sep 26 '22 18:09

Sergio Tulentsev


You can remove it directly using MongoDB scripting language:

db.mycoll.remove({_id:'your_id_here'});
like image 12
Pablo Santa Cruz Avatar answered Sep 25 '22 18:09

Pablo Santa Cruz