During a migration from MySQL into a Citus cluster, I used the range
distribution method. The migration is complete, but now I'd like to change the distribution method to hash
.
Is there a way to change the distribution method from range
to hash
for an existing table with data already in it?
I came up with the following procedure, but am not sure it's valid:
minvalue
and maxvalue
columns of the pg_dist_shard
table for all shards being changedpg_dist_partition
table from r
to h
COMMIT;
That is a good question. Currently, Citus does not provide a direct way to change partition type of existing data.
In range partitioning, records are placed in shards according to their partition column value and shard min/max values. If a record x resides in shard y, then it means y.minvalue <= x.partition_column <= y.maxvalue
.
In hash partitioning, the partition column is hashed and records are routed according to this hashed value. Therefore, min/max values you see in pg_dist_shard
are the boundary values for the result of the hash function. In this case y.minvalue <= hash(x.partition_column) <= y.maxvalue
.
Therefore, doing the changes you have mentioned would end up with an incorrect distribution. In order to switch from range partition to hash partition, the data should be re-distributed. To do that, I suggest reloading the data to an empty hash-partitioned table.
For more information, you can refer to Working with Distributed Tables and Hash Distribution sections of Citus Documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With