I have two tables
event
params
This table have relation 1-1 by id. If execute query
select count(*)
from
(select id from event where os like 'Android%')
inner join
(select id from params where sx >= 1024)
using id
they very slow
But if all data contains in one table
select count(*) from event where sx >= 1024 and os like 'Android%'
Query executed very fast.
Please, tell me how use join in ClickHouse DB effective? Keep all data in one table is not convenient.
Supported Types of JOINAll standard SQL JOIN types are supported: INNER JOIN , only matching rows are returned. LEFT OUTER JOIN , non-matching rows from left table are returned in addition to matching rows. RIGHT OUTER JOIN , non-matching rows from right table are returned in addition to matching rows.
Under the same conditions, ClickHouse can handle several hundred queries per second on a single server (up to several thousand in the best case). Since this scenario is not typical for analytical DBMSs, we recommend expecting a maximum of 100 queries per second.
Fast refresh for a materialized view containing only joins is possible after any type of DML to the base tables (direct-path or conventional INSERT , UPDATE , or DELETE ). A materialized view containing only joins can be defined to be refreshed ON COMMIT or ON DEMAND .
From SELECT queryCREATE TABLE [IF NOT EXISTS] [db.] table_name[(name1 [type1], name2 [type2], ...)] ENGINE = engine AS SELECT ... Creates a table with a structure like the result of the SELECT query, with the engine engine, and fills it with data from SELECT .
Clickhouse is an analytic column-based RDBMS. It's not a timeseries database. Each class of product is used to solve different problems. Time-series data is just data where every record has a "time" field.
I experience same problems with joining 2 huge distributed tables. There are 2 main problems
What works for me is sharding calculation query by id%N using subqueries and union all results then.
SELECT count(*)
FROM
(
SELECT 1
FROM event
WHERE id%2=0 AND id IN
(
SELECT id
FROM params
WHERE id % 2 = 0 AND sx >= 1024
)
UNION ALL
SELECT 2
FROM event
WHERE id % 2 = 1 AND id IN
(
SELECT id
FROM params
WHERE id % 2 = 1 AND sx >= 1024
)
)
You can change id%N(2 in the example) until you get needed performance. Need to replace IN to GLOBAL IN if you use distributed engines for tables.
You may rewrite query like this:
select count(*)
from event
where os like 'Android%'
AND id IN (select id from params where sx >= 1024)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With