I'm a little unfamiliar with ClickHouse and still study it by trial and error. Got a question about it.
Talking about the star scheme of data representations, with dimensions and facts. Currently, I keep everything in PostgreSQL, but OLAP queries with aggregations start to show bad timing, so I'm going to move some fact tables to ClickHouse. Initial tests of CH show incredible performance, however, in real life the queries should include joins to dimension tables from PostgreSQL. I know I can connect them as dictionaries.
Question: I found that using dictionaries I can make requests similar to LEFT JOINs in good old RDBMS, ie values from resultset could be joined with corresponding values from the dictionary. But can they be filtered by some restrictions on dictionary keys (as in INNER JOIN)? For example, in PostgreSQL I have a table users (id, name, ...)
and in ClickHouse I have table visits (user_id, source, medium, session_time, timestamp, ...)
with metrics about their visits to the site. Can I make a query to CH to fetch aggregated metrics (number of daily visits for given date range) of users which name
matches some condition (LIKE "EVE%"
for example)?
It sounds like ODBC table function is what you're looking for. ClickHouse have a bunch of table functions which work like Postgres foreign tables. The setup is similar to Dictionaries but you gain the traditional JOIN behavior. It currently doesn't show up in the official document. You can refer to this https://github.com/yandex/ClickHouse/blob/master/dbms/tests/integration/test_odbc_interaction/test.py#L84 . And in near future (this year), ClickHouse will have standard JOIN statement supported.
The dictionary will basically replace the value first. As I understand it your dictionary would be based off your users table.
Here is an example. Hopefully I am understanding your question.
select dictGetString('accountidmap', 'domain', tuple(toString(account_id))) AS domain, sum(session) as sessions from session_distributed where date = '2018-10-15' and like(domain, '%cats%') group by domain
This is a real query on our database so If there is something you want to try/confirm let me know
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With