Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding duplicates in cosmos db based on property value

I'm working with a cosmos container with ~1 million documents. All documents have unique ids. There is another property in each document that is supposed to be unique. However, there are an unknown number of documents that contain duplicates of this property. Here's an example of the document structure:

{
     "id":"unique-id",    
     "property":"value-that-is-supposed-to-be-unique"
}

I'm trying to get all documents for which "property" is not unique but struggling to come up with a query to do so. Best I've been able to com up with is the following:

SELECT c.property, count(1) FROM c GROUP BY c.property

which returns all "property" values and a count for each, but this is largely inadequate. I'm looking for the best way to get all ids of documents that contain duplicated "property" values.

like image 498
ben Avatar asked Oct 29 '25 19:10

ben


1 Answers

You can use the below query to find the duplicates.

select d.property,d.tot_count from
(SELECT c.property,count(1) as tot_count FROM c
group  by c.property) as d
where d.tot_count>1

I reproduced this with sample data.

Input

[
    {
        "id": "1",
        "property": "p1"
    },
    {
        "id": "2",
        "property": "p1"
    },
    {
        "id": "3",
        "property": "p2"
    }
]

Output

[
    {
        "property": "p1",
        "tot_count": 2
    }
]

enter image description here

like image 175
Aswin Avatar answered Oct 31 '25 14:10

Aswin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!