Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find duplicate records in a clickhouse db table

Please, how can I find duplicate data entries in one specific table in clickhouse DB?

I am investigating on a merge tree table and actually threw optimize statements at my table but that didn't do the trick. The duplicate entries still persist.

Preferred would be to have a universal strategy without referencing individual column names.

I only want to see the duplicate entries, since I am working on very large tables. Thank you.

like image 893
Axelfoley Avatar asked Oct 21 '25 07:10

Axelfoley


1 Answers

I compared with mySQL approaches and finally resolved by using this query.

SELECT 
    *, 
    count() AS cnt
FROM myDB.myTable
GROUP BY *
HAVING cnt > 1
ORDER BY date ASC

If that query gets to big I can run it in pieces.

SELECT 
    *,
    count() AS cnt
FROM myDB.myTable
WHERE (date >= '2020-08-01') AND (date < '2020-09-01')
GROUP BY *
HAVING cnt > 1
ORDER BY date ASC
like image 52
Axelfoley Avatar answered Oct 23 '25 22:10

Axelfoley