I understand that text and varchar are aliases, which store UTF-8 strings. What about ASCII, which in the documentation says "US-ASCII character string"? What's the difference besides encoding?
Is there any size difference? Is the a preferred choice between these two when I'm storing large strings (~500KB)?
Regarding this anwer:
If the data is a piece of text, for example a String in Java, which is encoded in UTF-16 in the runtime, but when serialized in Cassandra with text type then UTF-8 is used. UTF-16 always use 2 bytes per character and sometime 4 bytes, but UTF-8 is space efficient and depending on the character can be 1, 2, 3 or 4 bytes long.
That mean that there's CPU work to serialize such data for encoding/decoding purpose. Also depending on the text for example 158786464563, data will be stored with 12 bytes. That means more space is used and more IO as well.
Note cassandra offers the ascii type that follows the US-ASCII character set and is always using 1 byte per character.
Is there any size difference?
Yes
Is the a preferred choice between these two when I'm storing large strings (~500KB)?
Yes
Because ascii is more space efficient than UTF-8 and UTF-8 is more space efficient than UTF-16. Again all of the things depends how you are serializing/encoding/decoding those data. For more check-out this "what-is-the-advantage-of-choosing-ascii-encoding-over-utf-8"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With