One problem with blob for me is, in java, ByteBuffer (which is mapped to blob in cassandra) is not Serializable hence does not work well with EJBs.
Considering the json is fairly large what would be the better type for storing json in cassandra. Is it text or blob?
Does the size of the json matter when deciding the blob vs json?
If it were any other database like oracle, it's common to use blob/clob. But in Cassandra where each cell can hold as large as 2GB, does it matter?
Please consider this question as the choose between text vs blob for this case, instead of sorting to suggestions regarding whether to use single column for json.
I don't think there's any benefit for storing the literal JSON data as a BLOB
in Cassandra. At best your storage costs are identical, and in general the API's are less convenient in terms of working with BLOB
types as they are for working with strings/text.
For instance, if you're using their Java API then in order to store the data as a BLOB
using a parameterized PreparedStatement
you first need to load it all into a ByteBuffer
, for instance by packing your JSON data into an InputStream
.
Unless you're dealing with very large JSON snippets that force you to stream your data anyways, that's a fair bit of extra work to get access to the BLOB
type. And what would you gain from it? Essentially nothing.
However, I think there's some merit in asking 'Should I store JSON as text, or gzip it and store the compressed data as a BLOB
?'.
And the answer to that comes down to how you've configured Cassandra and your table. In particular, as long as you're using Cassandra version 1.1 or later your tables have compression enabled by default. That may be adequate, particularly if your JSON data is fairly uniform across each row.
However, Cassandra's built-in compression is applied table-wide, rather than to individual rows. So you may get a better compression ratio by manually compressing your JSON data before storage, writing the compressed bytes into a ByteBuffer
, and then shipping the data into Cassandra as a BLOB
.
So it essentially comes down to a tradeoff in terms of storage space vs. programming convenience vs. CPU usage. I would decide the matter as follows:
BLOB
;BLOB
;BLOB
. If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With