Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find duplicates documents?

Tags:

arangodb

aql

It's very strange that I did not find answer in documentation and here for a very simple question. How to find duplicated records in collections. For example I need to find duplicated by id for next documents:

{"id": 1, name: "Mike"},
{"id": 2, name: "Jow"},
{"id": 3, name: "Piter"},
{"id": 1, name: "Robert"}

I need to query that will return two documents with same id (id: 1 in my case).

like image 485
Dmitry Bubnenkov Avatar asked Oct 25 '25 05:10

Dmitry Bubnenkov


1 Answers

Have a look at the COLLECT AQL command, it can return the count of documents that contain duplicate values, such as your id key.

ArangoDB AQL - COLLECT

You can use LET a lot in AQL to help break down a query into smaller steps, and work with the output in future queries.

It may be possible to also collapse it all into one query, but this technique helps break it down.

LET duplicates = (
    FOR d IN myCollection
    COLLECT id = d.id WITH COUNT INTO count
    FILTER count > 1
    RETURN {
        id: id,
        count: count
    }
)

FOR d IN duplicates
FOR m IN myCollection
FILTER d.id == m.id
RETURN m

This will return:

[
  {
    "_key": "416140",
    "_id": "myCollection/416140",
    "_rev": "_au4sAfS--_",
    "id": 1,
    "name": "Mike"
  },
  {
    "_key": "416176",
    "_id": "myCollection/416176",
    "_rev": "_au4sici--_",
    "id": 1,
    "name": "Robert"
  }
]
like image 95
David Thomas Avatar answered Oct 26 '25 22:10

David Thomas