How understand the granularity and block in ClickHouse?

Tags:

clickhouse

I am not clear about these two words. Whether does one block have a fixed number of rows? Whether is one block the minimum unit to read from disk? Whether are different blocks stored in different files? Whether is the range of one block bigger than granule? That means, one block can have several granules skip indices.

377

asked Feb 17 '20 04:02

GOGO

1 Answers

https://clickhouse.tech/docs/en/operations/table_engines/mergetree/#primary-keys-and-indexes-in-queries

Primary key is sparsed. By default it contains 1 value of each 8192 rows (= 1 granule).

Let's disable adaptive granularity (for the test) -- index_granularity_bytes=0

create table X (A Int64) 
Engine=MergeTree order by A 
settings index_granularity=16,index_granularity_bytes=0;

insert into X select * from numbers(32);

index_granularity=16 -- 32 rows = 2 granule , primary index have 2 values 0 and 16

select marks, primary_key_bytes_in_memory from system.parts where table = 'X';
┌─marks─┬─primary_key_bytes_in_memory─┐
│     2 │                          16 │
└───────┴─────────────────────────────┘

16 bytes === 2 values of INT64.

Adaptive index granularity means that granules size various. Because wide rows (many bytes) needs (for performance) fewer (<8192) rows in granule.

index_granularity_bytes = 10MB ~ 1k row * 8129. So each granule have 10MB. If rows size 100k (long Strings), granule will have 100 rows (not 8192).

Skip index granules GRANULARITY 3 -- means that an index will store one value for each 3 table granules.

create table X (A Int64, B Int64, INDEX IX1 (B) TYPE minmax GRANULARITY 4) 
Engine=MergeTree order by A 
settings index_granularity=16,index_granularity_bytes=0;

insert into X select number, number from numbers(128);

128/16 = 8, table have 8 granules, INDEX IX1 stores 2 values of minmax (8/4)

So minmax index stores 2 values -- (0..63) and (64..128)

0..63 -- points to the first 4 table's granules.

64..128 -- points to the second 4 table' granules.

set send_logs_level='debug'
select * from X where B=77
[ 84 ] <Debug> dw.X (SelectExecutor): **Index `IX1` has dropped 1 granules**
[ 84 ] <Debug> dw.X (SelectExecutor): Selected 1 parts by date, 1 parts by key, **4 marks** to read from 1 ranges

SelectExecutor checked skip index - 4 table granules can be skipped because 77 is not in 0..63 . And another 4 granules must be read ( 4 marks ) because 77 in (64..128) -- some of that 4 granules have B=77.

174

answered Sep 24 '22 06:09

Denny Crane

Related questions
                            
                                Understanding clickhouse partitions
                            
                                Filtering results from ClickHouse using values from dictionaries
                            
                                clickhouse - how get count datetime per 1minute or 1day ,
                            
                                How to implement `pivot` in clickhouse just like in dolphindb
                            
                                How make JOIN table in ClickHouse DB faster?
                            
                                How to setup an admin account for Clickhouse?
                            
                                How to create primary keys in ClickHouse
                            
                                How to search the string in query with case insensitive on Clickhouse database?
                            
                                Code: 210. DB::NetException: Connection refused (localhost:9000)
                            
                                Select only rows with max date
                            
                                Clickhouse JDBC driver class name
                            
                                How to create database in database docker container?
                            
                                Change column name in a table in Clickhouse
                            
                                How to avoid duplicates in clickhouse table?
                            
                                Clickhouse string field disk usage: null vs empty
                            
                                Clickhouse as time-series storage
                            
                                Can I query per hour increment of a accumulation column in clickhouse?
                            
                                Inserting String Array through CSV format in ClickHouse db
                            
                                ClickHouse: How to store JSON data the right way?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With