Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are Cassandra user defined data types recommended in view of performance?

I have a Cassandra Customers table which is going to keep a list of customers. Every customer has an address which is a list of standard fields:

{
   CustomerName: "",
   etc...,
   Address: {
              street: "",
              city: "",
              province: "",
              etc...
            }
}

My question is if I have a million customers in this table and I use a user defined data type Address to keep the address information for each customers in the Customers table, what are the implications of such a model, especially in terms of disk space. Is this going to be very expensive? Should I use the Address user defined data type or flattent the address information or even use a separate table?

like image 352
Milen Kovachev Avatar asked Oct 20 '22 05:10

Milen Kovachev


1 Answers

Basically what happens in this case is that Cassandra will serialize instances of address into a blob, which is stored as a single column as part of your customer table. I don't have any numbers at hand on how much the serialization will put on top on disk or cpu usage, but it probably will not make a big difference for your use case. You should test both cases to be sure.

Edit: Another aspect I should also have mentioned: handling UDTs as single blobs will imply to replace the complete UDT for any updates. This will be less efficient than updating individual columns and is a potential cause for inconsistencies. In case of concurrent updates both writes could overwrite each others changes. See CASSANDRA-7423.

like image 76
Stefan Podkowinski Avatar answered Oct 29 '22 22:10

Stefan Podkowinski