I am learning about the Apache Cassandra database [sic].
Does anyone have any good/bad experiences with deploying Cassandra to less than dedicated hardware like the offerings of Linode or Slicehost?
I think Cassandra would be a great way to scale a web service easily to meet read/write/request load... just add another Linode running a Cassandra node to the existing cluster. Yes, this implies running the public web service and a Cassandra node on the same VPS (which many can take exception with).
Pros of Linode-like deployment for Cassandra:
Cons:
EDIT: found this which helps a bit: http://wiki.apache.org/cassandra/CassandraHardware
I see that 1GB is the minimum but is this a recommendation? Could I deploy with a Linode 720 for instance (say 500 MB usable to Cassandra)? See http://www.linode.com/
Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node for uncompressed data. For Cassandra 1.1, it is 500 to 800GB per node. Be sure to account for replication.
Cassandra has a ring-type architecture. Cassandra has no master nodes and no single point of failure. Cassandra supports network topology with multiple data centers, multiple racks, and nodes.
A Cassandra cluster does not have a single point of failure as a result of the peer-to-peer distributed architecture. Nodes in a cluster communicate with each other for various purposes. There are various components used in this process: Seeds: Each node configures a list of seeds which is simply a list of other nodes.
As we said earlier, each instance of Cassandra has evolved to contain 256 virtual nodes. The Cassandra server runs core processes. For example, processes like spreading replicas around nodes or routing requests.
How much ram you needs really depends on your workload: if you are write-mostly you can get away with less, otherwise you will want ram for the read cache.
You do get more ram for you money at my employer, rackspace cloud: http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing. (our machines also have raided disks so people typically see better i/o performance vs EC2. Dunno about linode.)
Since with most VPSes you pay roughly 2x for the next-size instance, i.e., about the same as adding a second small instance, I would recommend going with fewer, larger instances than more, smaller ones, since in small numbers network overhead is not negligible.
I do know someone using Cassandra on 256MB VMs but you're definitely in the minority if you go that small.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With