Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to store a JSON string in a Cassandra column?

Tags:

Cassandra newbie question. I'm collecting some data from a social networking site using REST calls. So I end up with the data coming back in JSON format.

The JSON is only one of the columns in my table. I'm trying to figure out what the "best practice" is for storing the JSON string.

First I thought of using the map type, but the JSON contains a mix of strings, numerical types, etc. It doesn't seem like I can declare wildcard types for the map key/value. The JSON string can be quite large, probably over 10KB in size. I could potentially store it as a string, but it seems like that would be inefficient. I would assume this is a common task, so I'm sure there are some general guidelines for how to do this.

I know Cassandra has native support for JSON, but from what I understand, that's mostly used when the entire JSON map matches 1-1 with the database schema. That's not the case for me. The schema has a bunch of columns and the JSON string is just a sort of "payload". Is it better to store the JSON string as a blob or as text? BTW, the Cassandra version is 2.1.5.

Any hints appreciated. Thanks in advance.

like image 938
user2337270 Avatar asked Mar 31 '16 19:03

user2337270


People also ask

How do I store a JSON object in Cassandra?

Cassandra Query Language (CQL) supported JSON format in which data is inserted as a string but in actual is stored as a given data type value. For example, if Id has int data type then the value of id will be inserted like “Id”: “101” but in actual it will be stored as an INT.

Does Cassandra support JSON data type?

Cassandra provides support for JSON. You can, of course, store JSON text into Cassandra text columns. But that is not what we mean here. Here we mean you can use JSON to write to Cassandra tables.

How do I query a JSON column in Cassandra?

Assuming user-id is the partition key, and assuming you want to retrieve a JSON object corresponding to user of id 1, you should try: SELECT JSON * FROM mytable WHERE userid=1; If userid is of type text, you will need to add some quotes.

What is JSON blob?

JSON Blob was created to help parallelize client/server development. Mock JSON responses can be defined using the online editor and then clients can use the JSON Blob API to retrieve and update the mock responses. Blobs that are not accessed in 75 DAYS will be removed.


2 Answers

In the Cassandra Storage engine there's really not a big difference between a blob and a text, since Cassandra stores text as blobs essentially. And yes the "native" JSON support you speak of is only for when your data model matches your JSON model, and it's only in Cassandra 2.2+.

I would store it as a text type, and you shouldn't have to implement anything to compress your JSON data when sending the data (or handle uncompressing). Since Cassandra's Binary Protocol supports doing transport compression. Also make sure your table is storing the data compressed with the same compression algorithm (I suggest using LZ4 since it's the fastest algo implmeneted) to save on doing compression for each read request. Thus if you configure storing the data compressed and use transport compression, you don't even have to implement either yourself.

You didn't say which Client Driver you're using, but here's the documentation on how to setup Transport Compression for Datastax Java Client Driver.

like image 108
fromanator Avatar answered Sep 19 '22 14:09

fromanator


It depends on how to want to query your JSON. There are 3 possible strategies:

  1. Store as a string
  2. Store as a compressed blob
  3. Store as a blob

Option 1 has the advantage of being human readable when you query your data on command line with cqlsh or if you want to debug data directly live. The drawback is the size of this JSON column (10k)

Option 2 has the advantage to keep the JSON payload small because text elements have a pretty decent compression ration. Drawbacks are: a. you need to take care of compression/decompression client side and b. it's not human readable directly

Option 3 has drawbacks of option 1 (size) and 2 (not human readable)

like image 22
doanduyhai Avatar answered Sep 17 '22 14:09

doanduyhai