Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple small inserts in clickhouse

Tags:

clickhouse

I have an event table (MergeTree) in clickhouse and want to run a lot of small inserts at the same time. However the server becomes overloaded and unresponsive. Moreover, some of the inserts are lost. There are a lot of records in clickhouse error log:

01:43:01.668 [ 16 ] <Error> events (Merger): Part 201 61109_20161109_240760_266738_51 intersects previous part

Is there a way to optimize such queries? I know I can use bulk insert for some types of events. Basically, running one insert with many records, which clickhouse handles pretty well. However, some of the events, such as clicks or opens could not be handled in this way.

The other question: why clickhouse decides that similar records exist, when they don't? There are similar records at the time of insert, which have the same fields as in index, but other fields are different.

From time to time I also receive the following error:

Caused by: ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, message: Connect to localhost:8123 [ip6-localhost/0:0:0:0:0:0:0:1] timed out, host: localhost, port: 8123; Connect to ip6-localhost:8123 [ip6-localhost/0:0:0:0:0:0:0:1] timed out
    ... 36 more

Mostly during project build when test against clickhouse database are run.

like image 294
alniks Avatar asked Nov 14 '16 15:11

alniks


People also ask

How do you insert data into ClickHouse?

Inserts data into a table. INSERT INTO [db.] table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ... You can specify a list of columns to insert using the (c1, c2, c3) .

How do I configure ClickHouse?

From Precompiled Binaries for Non-Standard Environments​Run sudo ./clickhouse install to install ClickHouse system-wide (also with needed configuration files, configuring users etc.). Then run sudo clickhouse start commands to start the clickhouse-server and clickhouse-client to connect to it.

What is ClickHouse cluster?

ClickHouse clusters are one or more database hosts that replication can be configured between. Note. The number of hosts you can create together with a ClickHouse cluster depends on the selected storage type and host class.


1 Answers

Clickhouse has special type of tables for this - Buffer. It's stored in memory and allow many small inserts with out problem. We have near 200 different inserts per second - it works fine.

Buffer table:

CREATE TABLE logs.log_buffer (rid String, created DateTime, some String, d Date MATERIALIZED toDate(created))
ENGINE = Buffer('logs', 'log_main', 16, 5, 30, 1000, 10000, 1000000, 10000000);

Main table:

CREATE TABLE logs.log_main (rid String, created DateTime, some String, d Date) 
ENGINE = MergeTree(d, sipHash128(rid), (created, sipHash128(rid)), 8192);

Details in manual: https://clickhouse.yandex/docs/en/operations/table_engines/buffer/

like image 188
Alexander Suvorov Avatar answered Sep 21 '22 07:09

Alexander Suvorov