Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra vs MongoDB - Storing JSON data with previously unknown keys?

I'm trying to integrate a NoSQL database to store JSON data, rather than a SQL database to store JSON data (A column that stores a JSON object).

For MongoDB, I can insert a JSON file just by doing:

document = <JSON OBJECT>
collection.insert(document)

However, for Cassandra, according to this webpage: http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support

It cannot be schema less, meaning that I would need to create a table beforehand:

CREATE TABLE users (
    id text PRIMARY KEY,
    age int,
    state text
);

And then insert the data:

INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}';

The issue is that I want to try and use Cassandra, I've just completed DataStax's tutorial, but it seems that I would need to know the keys of the JSON data beforehand, which is not possible.

Or should I alter the table when there is a new data column if there is an unknown key? That doesn't sound like a very good design decision.

Can anyone point me to the right direction? Thanks

like image 755
user1157751 Avatar asked Oct 06 '15 20:10

user1157751


People also ask

Which database is best for storing JSON data?

If you're using static JSON data and active data that's structured for SQL storage, Postgres is a good shout — its JSONB representation is efficient and allows for indexing. That said, you can use ODBC and BI integration to run SQL queries on MongoDB reporting, too.

Can Cassandra store JSON?

Cassandra provides support for JSON. You can, of course, store JSON text into Cassandra text columns.

Can you store JSON in NoSQL?

NoSQL JSON databases handle documents as individual data file objects without using structured tables. A row count or table size does not constrain the number of documents stored in a JSON database. Instead, storage availability is the only limit to data volume.


1 Answers

This JSON support is very misleading - it's JSON in Cql support, not in storage.

Or should I alter the table when there is a new data column if there is an unknown key? That doesn't sound like a very good design decision.

Indeed this isn't good decision - your fields in JSON can have different types across entities - one column name couldn't serve it all. Also, adding new field requires schema propagation across your cluster, so the very first insert (which would contain of alter table + insert data) would be very slow.

Cassandra doesn't give you any built in mechanism, but what you can do, is to put whole JSON in one field and expose needed properties in additional separate columns. For example:

CREATE TABLE users (
    id text PRIMARY KEY,
    json text, //in json age and state
    age int //explicit duplicated property - if you need e.g. index
);

BTW. AFAIK Cassandra used to support your case long time ago, but now it's more 'strongly typed'.

like image 79
piotrwest Avatar answered Oct 20 '22 01:10

piotrwest