Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any "gotchas" in deploying a Cassandra cluster to a set of Linode VPS instances?

I am learning about the Apache Cassandra database [sic].

Does anyone have any good/bad experiences with deploying Cassandra to less than dedicated hardware like the offerings of Linode or Slicehost?

I think Cassandra would be a great way to scale a web service easily to meet read/write/request load... just add another Linode running a Cassandra node to the existing cluster. Yes, this implies running the public web service and a Cassandra node on the same VPS (which many can take exception with).

Pros of Linode-like deployment for Cassandra:

  • Private VLAN; the Cassandra nodes could communicate privately
  • An API to provision a new Linode (and perhaps configure it with a "StackScript" that installs Cassandra and its dependencies, etc.)
  • The price is right

Cons:

  • Each host is a VPS and is not dedicated of course
  • The RAM/cost ratio is not that great once you decide you want 4GB RAM (cf. dedicated at say SoftLayer)
  • Only 1 disk where one would prefer 2 disks I suppose (1 for the commit log and another disk for the data files themselves). Probably moot since this is shared hardware anyway.

EDIT: found this which helps a bit: http://wiki.apache.org/cassandra/CassandraHardware

I see that 1GB is the minimum but is this a recommendation? Could I deploy with a Linode 720 for instance (say 500 MB usable to Cassandra)? See http://www.linode.com/

like image 528
z8000 Avatar asked Feb 18 '10 19:02

z8000


People also ask

How much data can a single Cassandra node effectively handle?

Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node for uncompressed data. For Cassandra 1.1, it is 500 to 800GB per node. Be sure to account for replication.

Which type of network architecture is used by Cassandra?

Cassandra has a ring-type architecture. Cassandra has no master nodes and no single point of failure. Cassandra supports network topology with multiple data centers, multiple racks, and nodes.

How does Cassandra cluster work?

A Cassandra cluster does not have a single point of failure as a result of the peer-to-peer distributed architecture. Nodes in a cluster communicate with each other for various purposes. There are various components used in this process: Seeds: Each node configures a list of seeds which is simply a list of other nodes.

How many nodes does Cassandra cluster have?

As we said earlier, each instance of Cassandra has evolved to contain 256 virtual nodes. The Cassandra server runs core processes. For example, processes like spreading replicas around nodes or routing requests.


1 Answers

How much ram you needs really depends on your workload: if you are write-mostly you can get away with less, otherwise you will want ram for the read cache.

You do get more ram for you money at my employer, rackspace cloud: http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing. (our machines also have raided disks so people typically see better i/o performance vs EC2. Dunno about linode.)

Since with most VPSes you pay roughly 2x for the next-size instance, i.e., about the same as adding a second small instance, I would recommend going with fewer, larger instances than more, smaller ones, since in small numbers network overhead is not negligible.

I do know someone using Cassandra on 256MB VMs but you're definitely in the minority if you go that small.

like image 175
jbellis Avatar answered Sep 28 '22 05:09

jbellis