Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate duplicate documents with values in array in Mongo

I have a large collection of documents that look as follows:

{ "_id": "5a760191813a54000b8475f1", "orders": [{ "row": "3", "seat": "11" }, { "row": "3", "seat": "12" }], "product_id": "5a7628bedbcc42000aa7f614" },
{ "_id": "5a75f6f17abe45000a3ba05e", "orders": [{ "row": "3", "seat": "12" }, { "row": "3", "seat": "13" }], "product_id": "5a7628bedbcc42000aa7f614" },
{ "_id": "5a75ebdf813a54000b8475e7", "orders": [{ "row": "5", "seat": "16" }, { "row": "5", "seat": "15" }], "product_id": "5a75f711dbcc42000c459efc" }

I need to be able to find any documents where the product_id and items in the orders array are duplicates. I can't quite seem to wrap my head around accomplishing this. Any pointers?

like image 960
Malmoc Avatar asked Sep 02 '21 14:09

Malmoc


2 Answers

I don't know what output you want, but this has the information about the duplicates, maybe you want to add unwind on duplicates also.

Result documents

  • product_id
  • order (that found duplicated)
  • duplicates (the documents that had that order as duplicate)

For your data would print

[{
  "duplicates": [
    "5a760191813a54000b8475f1",
    "5a75f6f17abe45000a3ba05e"
  ],
  "order": {
    "row": "3",
    "seat": "12"
  },
  "product_id": "5a7628bedbcc42000aa7f614"
}]

Query
(run it on your driver, MongoPlayground doesn't keep the order of fields and can show wrong results)

aggregate(
[{"$unwind" : {"path" : "$orders"}},
 {
  "$group" : {
    "_id" : {
      "orders" : "$orders",
      "product_id" : "$product_id"
    },
    "duplicates" : {
      "$push" : "$_id"
    }
  }
 },
 {"$match" : {"$expr" : {"$gt" : [ {"$size" : "$duplicates"}, 1 ]}}},
 {
  "$project" : {
    "_id" : 0,
    "order" : "$_id.orders",
    "product_id" : "$_id.product_id",
    "duplicates" : 1
  }
 } 
])

Data (i added some more data)

[
  {
    "_id": "5a760191813a54000b8475f1",
    "orders": [
      {
        "row": "3",
        "seat": "11"
      },
      {
        "row": "3",
        "seat": "12"
      }
    ],
    "product_id": "5a7628bedbcc42000aa7f614"
  },
  {
    "_id": "5a75f6f17abe45000a3ba05g",
    "orders": [
      {
        "row": "3",
        "seat": "12"
      },
      {
        "row": "3",
        "seat": "13"
      }
    ],
    "product_id": "5a7628bedbcc42000aa7f614"
  },
  {
    "_id": "5a75f6f17abe45000a3ba05e",
    "orders": [
      {
        "row": "3",
        "seat": "12"
      },
      {
        "row": "3",
        "seat": "13"
      }
    ],
    "product_id": "5a7628bedbcc42000aa7f614"
  },
  {
    "_id": "5a75ebdf813a54000b8475e7",
    "orders": [
      {
        "row": "5",
        "seat": "16"
      },
      {
        "row": "5",
        "seat": "15"
      }
    ],
    "product_id": "5a75f711dbcc42000c459efc"
  }
]

Results

[{
  "duplicates": [
    "5a75f6f17abe45000a3ba05g",
    "5a75f6f17abe45000a3ba05e"
  ],
  "order": {
    "row": "3",
    "seat": "13"
  },
  "product_id": "5a7628bedbcc42000aa7f614"
},
{
  "duplicates": [
    "5a760191813a54000b8475f1",
    "5a75f6f17abe45000a3ba05g",
    "5a75f6f17abe45000a3ba05e"
  ],
  "order": {
    "row": "3",
    "seat": "12"
  },
  "product_id": "5a7628bedbcc42000aa7f614"
}]
like image 181
Takis Avatar answered Nov 14 '22 23:11

Takis


You could use below query. $unwind the orders array, $group by order row and product and collect matching ids and count. Keep the documents where count is greater than 1. $lookup to pull in the matching documents by id and $replaceRoot to flatten the documents.

db.collection.aggregate([
  {
    "$unwind": "$orders"
  },
  {
    "$group": {
      "_id": {
        "order": "$orders",
        "product_id": "$product_id"
      },
      "count": {
        "$sum": 1
      },
      "doc_ids": {
        "$push": "$_id"
      }
    }
  },
  {
    "$match": {
      "count": {
        "$gt": 1
      }
    }
  },
  {
    "$lookup": {
      "from": "collection",
      "localField": "doc_ids",
      "foreignField": "_id",
      "as": "documents"
    }
  },
  {
    "$unwind": "$documents"
  },
  {
    "$replaceRoot": {
      "newRoot": "$documents"
    }
  }
])

https://mongoplayground.net/p/YbztEGttUMx

like image 28
s7vr Avatar answered Nov 15 '22 01:11

s7vr