Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete result of mongodb $lookup aggregation

How do I delete all the chunks documents which are returned as a result of this aggregation?

db.getCollection('chunks').aggregate([
    {
      $lookup:
        {
          from: "files",
          localField: "files_id",
          foreignField: "_id",
          as: "file"
        }
   },
   {
     $match:
       {
         "file.uploadDate":
           {
             $lt: ISODate("2017-06-10T00:00:00.000Z")
           }
       }
   }
])

My schema has a collection named files, which contains file metadata (name, uploadDate) and chunks, which contain the actual data (binary, files_id)

I am aware of db.collection.deleteMany({}) however it accepts only a match filter.

I have MongoDB 3.2

like image 413
Babken Vardanyan Avatar asked Dec 14 '22 22:12

Babken Vardanyan


2 Answers

Loop the results:

var ops = [];

db.getCollection('chunks').aggregate([
    {
      $lookup:
        {
          from: "files",
          localField: "files_id",
          foreignField: "_id",
          as: "file"
        }
   },
   {
     $match:
       {
         "file.uploadDate":
           {
             $lt: ISODate("2017-06-10T00:00:00.000Z")
           }
       }
   }
]).forEach(doc => {
  ops = [
    ...ops,
    { "deleteOne": {
       "filter": { "_id": doc._id }   
    }}
  ];
  if ( ops.length >= 1000 ) {
    db.getCollection('chunks').bulkWrite(ops);
    ops = [];
  }
});

if ( ops.length > 0 ) {
  db.getCollection('chunks').bulkWrite(ops);
  ops = [];
}

Or in environments without ES6:

var ops = [];

db.getCollection('chunks').aggregate([
    {
      $lookup:
        {
          from: "files",
          localField: "files_id",
          foreignField: "_id",
          as: "file"
        }
   },
   {
     $match:
       {
         "file.uploadDate":
           {
             $lt: ISODate("2017-06-10T00:00:00.000Z")
           }
       }
   }
]).forEach(function(doc) {

  ops.push({ "deleteOne": { "filter": { "_id": doc._id }  } });

  if ( ops.length >= 1000 ) {
    db.getCollection('chunks').bulkWrite(ops);
    ops = [];
  }
});

if ( ops.length > 0 ) {
  db.getCollection('chunks').bulkWrite(ops);
  ops = [];
}

Using .bulkWrite() then you are basically "batching" the requests in lots of 1000. So the actual writes and responses from the database happen only at tha time and not for all entries.

You cannot supply an aggregation pipeline as the query argument to the general .remove**() methods. So what you do instead is loop the cursor with an action like this.

like image 153
Neil Lunn Avatar answered Dec 16 '22 11:12

Neil Lunn


After you getting aggregate result you can use map function to get all chunk ids and then you can use db.collection.remove() with $in operator.

var pipeline = [
  {$lookup:{
      from: "files",
      localField: "files_id",
      foreignField: "_id",
      as: "file"
    }
  },
  {$match:{
      "file.uploadDate":
      {
        $lt: ISODate("2017-06-10T00:00:00.000Z")
      }
    }
  }
];

var cursor = db.chunks.aggregate(pipeline);
var chunkIds = cursor.map(function (chunk) { return chunk._id; });
db.chunks.remove({"_id": { "$in": chunkIds }});
like image 39
Shaishab Roy Avatar answered Dec 16 '22 10:12

Shaishab Roy