Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering results from ClickHouse using values from dictionaries

I'm a little unfamiliar with ClickHouse and still study it by trial and error. Got a question about it.

Talking about the star scheme of data representations, with dimensions and facts. Currently, I keep everything in PostgreSQL, but OLAP queries with aggregations start to show bad timing, so I'm going to move some fact tables to ClickHouse. Initial tests of CH show incredible performance, however, in real life the queries should include joins to dimension tables from PostgreSQL. I know I can connect them as dictionaries.

Question: I found that using dictionaries I can make requests similar to LEFT JOINs in good old RDBMS, ie values from resultset could be joined with corresponding values from the dictionary. But can they be filtered by some restrictions on dictionary keys (as in INNER JOIN)? For example, in PostgreSQL I have a table users (id, name, ...) and in ClickHouse I have table visits (user_id, source, medium, session_time, timestamp, ...) with metrics about their visits to the site. Can I make a query to CH to fetch aggregated metrics (number of daily visits for given date range) of users which name matches some condition (LIKE "EVE%" for example)?

like image 859
Ilya Konyukhov Avatar asked Oct 14 '18 03:10

Ilya Konyukhov


2 Answers

It sounds like ODBC table function is what you're looking for. ClickHouse have a bunch of table functions which work like Postgres foreign tables. The setup is similar to Dictionaries but you gain the traditional JOIN behavior. It currently doesn't show up in the official document. You can refer to this https://github.com/yandex/ClickHouse/blob/master/dbms/tests/integration/test_odbc_interaction/test.py#L84 . And in near future (this year), ClickHouse will have standard JOIN statement supported.

like image 57
Amos Avatar answered Oct 30 '22 02:10

Amos


The dictionary will basically replace the value first. As I understand it your dictionary would be based off your users table.

Here is an example. Hopefully I am understanding your question.

select dictGetString('accountidmap', 'domain', tuple(toString(account_id))) AS domain, sum(session) as sessions from session_distributed where date = '2018-10-15' and like(domain, '%cats%') group by domain

This is a real query on our database so If there is something you want to try/confirm let me know

like image 45
ozzieisaacs Avatar answered Oct 30 '22 00:10

ozzieisaacs