I'm trying to integrate a NoSQL database to store JSON data, rather than a SQL database to store JSON data (A column that stores a JSON object).
For MongoDB, I can insert a JSON file just by doing:
document = <JSON OBJECT>
collection.insert(document)
However, for Cassandra, according to this webpage: http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support
It cannot be schema less, meaning that I would need to create a table beforehand:
CREATE TABLE users (
id text PRIMARY KEY,
age int,
state text
);
And then insert the data:
INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}';
The issue is that I want to try and use Cassandra, I've just completed DataStax's tutorial, but it seems that I would need to know the keys of the JSON data beforehand, which is not possible.
Or should I alter the table when there is a new data column if there is an unknown key? That doesn't sound like a very good design decision.
Can anyone point me to the right direction? Thanks
If you're using static JSON data and active data that's structured for SQL storage, Postgres is a good shout — its JSONB representation is efficient and allows for indexing. That said, you can use ODBC and BI integration to run SQL queries on MongoDB reporting, too.
Cassandra provides support for JSON. You can, of course, store JSON text into Cassandra text columns.
NoSQL JSON databases handle documents as individual data file objects without using structured tables. A row count or table size does not constrain the number of documents stored in a JSON database. Instead, storage availability is the only limit to data volume.
This JSON support is very misleading - it's JSON in Cql support, not in storage.
Or should I alter the table when there is a new data column if there is an unknown key? That doesn't sound like a very good design decision.
Indeed this isn't good decision - your fields in JSON can have different types across entities - one column name couldn't serve it all. Also, adding new field requires schema propagation across your cluster, so the very first insert (which would contain of alter table + insert data) would be very slow.
Cassandra doesn't give you any built in mechanism, but what you can do, is to put whole JSON in one field and expose needed properties in additional separate columns. For example:
CREATE TABLE users (
id text PRIMARY KEY,
json text, //in json age and state
age int //explicit duplicated property - if you need e.g. index
);
BTW. AFAIK Cassandra used to support your case long time ago, but now it's more 'strongly typed'.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With