I'm designing a schema for a large Clickhouse table with string fields that can be pretty sparse.
I'm wondering if these fields should be nullable or if I should store an empty string ""
as a default value. Which would be better in terms of storage?
From SELECT query CREATE TABLE [IF NOT EXISTS] [db.] table_name[(name1 [type1], name2 [type2], ...)] ENGINE = engine AS SELECT ... Creates a table with a structure like the result of the SELECT query, with the engine engine, and fills it with data from SELECT .
The nullability of a field defines whether the field can contain a null. A field containing a null does not contain a value. For example, you can have a data set whose record schema contains an age field. If the age field of a particular record is null, the age is not known for the person corresponding to the record.
In ClickHouse, NULL and NOT NULL do change the behavior of the data type, but not in the way other relational databases - it is syntactically compatible with other relational database but not semantically ( a Int32 NULL is the same as a Nullable(Int32) , as a Int32 NOT NULL is the same as a Int32 ).
An empty string is useful when the data comes from multiple resources. NULL is used when some fields are optional, and the data is unknown.
You should store an empty string ""
Nullable column takes more disk space and slowdown queries upto two times. This is an expected behaviour by design.
Inserts slowed down as well, because Nullable columns are stored in 4 files but non-Nullable only in 2 files for each column.
https://gist.github.com/den-crane/e43f8d0ad6f67ab9ffd09ea3e63d98aa
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With