Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

How to avoid duplicates in clickhouse table?

Tags:

clickhouse

I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?

CREATE TABLE sample.tmp_api_logs ( id UInt32,  EventDate Date) 
ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);

insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');

select * from sample.tmp_api_logs;
/*
┌─id─┬──EventDate─┐
│  1 │ 2018-11-23 │
│  2 │ 2018-11-23 │
└────┴────────────┘
┌─id─┬──EventDate─┐
│  1 │ 2018-11-23 │
│  2 │ 2018-11-23 │
└────┴────────────┘
*/

like image

747

asked Nov 23 '18 07:11

user3383468

People also ask

How do I remove duplicates in Clickhouse?

ALTER TABLE my_table DELETE WHERE (select *, count() AS cnt from my_table GROUP BY * HAVING cnt > 1 );

2 Answers

Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.

like image

147

answered Oct 12 '22 07:10

Amos

If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.

Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.

like image

39

answered Oct 12 '22 08:10

Ivan Blinkov

Sign in to Comment

Related questions
                            
                                Return clickhouse array as column
                            
                                How to have auto increment in ClickHouse?
                            
                                Is there a way to join all arrays in clickhouse column and then filter for duplicates?
                            
                                How does one properly edit the clickhouse-server config.xml file?
                            
                                Importing from MySQL dump to Clickhouse
                            
                                Connect to remote clickhouse db via clickhouse command line
                            
                                Replacement for row_number() in clickhouse
                            
                                Understanding clickhouse partitions
                            
                                Filtering results from ClickHouse using values from dictionaries
                            
                                clickhouse - how get count datetime per 1minute or 1day ,
                            
                                How to implement `pivot` in clickhouse just like in dolphindb
                            
                                How make JOIN table in ClickHouse DB faster?
                            
                                How to setup an admin account for Clickhouse?
                            
                                How to create primary keys in ClickHouse
                            
                                How to search the string in query with case insensitive on Clickhouse database?
                            
                                Code: 210. DB::NetException: Connection refused (localhost:9000)
                            
                                Select only rows with max date
                            
                                Clickhouse JDBC driver class name
                            
                                How to create database in database docker container?
                            
                                Change column name in a table in Clickhouse

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With