Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I store nested data in Cassandra

Consider the following "documents", how these two documents would be stored in a collection.

// collection posts:
{
  id: 1,
  name: "kingsbounty",
  fields: {
    "title": {
      "title": "Game Title",
      "value": "Kings Bounty"
    }
  },
  {
    "body": {
      "title": "Game Description",
      "value": "Kings Bounty is a turn-based fantasy..."
    }
  }
}

// collection posts:
{
  id: 2,
  name: "outrun",
  fields: { 
    "vehicle": {
      "title": "Vehicle",
      "value": "Ferrari Testarossa"
    },
    "color": {
      "title": "Vehicle Color",
      "value": "Red"
    },
    "driver": {
      "title": "Driver",
      "value": "David Hasselhoff"
    }
  }
}

notice how fields is a map of varying size.

since cassandra does not allow defining this type fields <map <map, text>>

i wanted to learn the "cassandra" way to do it, the denormalized way. this way is not denormalized, though would work to store and retrieve arbitrary lengths of nested data.

CREATE TABLE posts (
  id uuid,
  name text,
  fields list<text>
  PRIMARY KEY (id)
);
CREATE INDEX post_name_key ON posts (name);

CREATE TABLE post_fields (
  post_name text,
  field_name text,
  title text,
  value text,
  PRIMARY KEY (post_name, field_name)
);

INSERT INTO posts (id, name, fields) VALUES ( uuid(), 'kingsbounty', [ 'title', 'body' ] );
INSERT INTO posts (id, name, fields) VALUES ( uuid(), 'outrun', [ 'vehicle', 'color', 'driver' ] );

INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'kingsbounty', 'title', 'Game Title', 'Kings Bounty');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'kingsbounty', 'body', 'Game Description', 'Kings Bounty is a turn-based fantasy...');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'vehicle', 'Vehicle', 'Ferrari Testarossa');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'color', 'Vehicle Color', 'Red');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'driver', 'Driver', 'David Hasselhoff');

SELECT fields FROM posts WHERE name = 'kingsbounty';

     fields
    -------------------
     ['title', 'body']

SELECT * FROM post_fields WHERE post_name = 'kingsbounty';

     post_name   | field_name | title            | value
    -------------+------------+------------------+-----------------------------------------
     kingsbounty |       body | Game Description | Kings Bounty is a turn-based fantasy...
     kingsbounty |      title |       Game Title |                            Kings Bounty

SELECT fields FROM posts WHERE name = 'outrun';

     fields
    --------------------------------
     ['vehicle', 'color', 'driver']

SELECT * FROM post_fields WHERE post_name = 'outrun';

     post_name | field_name | title         | value
    -----------+------------+---------------+--------------------
        outrun |      color | Vehicle Color |                Red
        outrun |     driver |        Driver |   David Hasselhoff
        outrun |    vehicle |       Vehicle | Ferrari Testarossa

what's a better, denormalized way to store this type of data?

like image 818
theRemix Avatar asked May 29 '15 22:05

theRemix


1 Answers

jeffj from #cassandra on irc advises that i don't even need the first table.

i'm starting to understand it now.

CREATE TABLE posts (
  name text,
  field text,
  title text,
  value text,
  PRIMARY KEY (name, field)
);

INSERT INTO posts (name, field, title, value) VALUES ( 'kingsbounty', 'title', 'Game Title', 'Kings Bounty');
INSERT INTO posts (name, field, title, value) VALUES ( 'kingsbounty', 'body', 'Game Description', 'Kings Bounty is a turn-based fantasy...');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'vehicle', 'Vehicle', 'Ferrari Testarossa');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'color', 'Vehicle Color', 'Red');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'driver', 'Driver', 'David Hasselhoff');

SELECT field FROM posts WHERE name = 'kingsbounty';

 field
-------
  body
 title

SELECT * FROM posts WHERE name = 'kingsbounty';

 name        | field | title            | value
-------------+-------+------------------+-----------------------------------------
 kingsbounty |  body | Game Description | Kings Bounty is a turn-based fantasy...
 kingsbounty | title |       Game Title |                            Kings Bounty

SELECT fields FROM posts WHERE name = 'outrun';

 field
---------
   color
  driver
 vehicle


SELECT * FROM posts WHERE name = 'outrun';

 name   | field   | title         | value
--------+---------+---------------+--------------------
 outrun |   color | Vehicle Color |                Red
 outrun |  driver |        Driver |   David Hasselhoff
 outrun | vehicle |       Vehicle | Ferrari Testarossa
like image 80
theRemix Avatar answered Nov 06 '22 08:11

theRemix