Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High Level Java Client selection for Apache Cassandra [closed]

There are four high level APIs to access Cassandra and I do not have time to try them all. So I hoped to find somebody who could help me to choose the proper one.

I'll try to write down my findings about them:

Datanucleus-Cassandra-Plugin

pros:

  • supports JPA1, JPA2, JDO1 - JDO3 - as I read in a review, JDO scales better than Hibernate with JPA
  • all the pros as mentioned in kundera?

cons:

  • no exeirience with JDO up to now (relevant only for me of course ;)
  • documentation not found!

kundera

pros:

  • JPA 1.0 annotations with all advantages (standard conform, no boilerplate code, ...)
  • promise for following features in near future: JPA listeners, @PrePersist @PostPersist etc. - Relationships, @OneToMany, @ManyToMany etc. - Transactional support, @Transactional

cons:

  • early development stage of the plugin?
  • bugs?
  • no possibillity to fix problems in the JDO / JPA framework?

s7 pelops

pros:

  • pure java api --> finer control over persistence?

cons:

  • pure java api --> boilerplate code

hector 0.7

pros:

  • mavenized
  • spring integration --> dependency injection
  • pure java api --> finer control over persistence?
  • jmx monitoring?
  • managing of nodes seems to be easy and flexible

cons:

  • pure java api (no annotations) --> boiler plate code

Conclusion so far

As I am confident with RDMS, Hibernate, JPA, Spring and not so up to date anymore with EJB, my first impression was, to go for kundera would have been the right choice. But after reading some posts regarding JPO, DataNucleus, I am not sure anymore. As the learning curve should be steep (also for expirienced JPA developers?) for DataNucleus, I am not sure, whether I should go for it.

My major concern is the status of the plugin. Also the forum support/help for JDO and Datanucleus-Cassandra-Plugin, as it is not as wide spread, as far as I understood.

Is anybody out there, who has experience, with some of the framworks already and can give me a hint? Maybe a mixed strategy would make sense as well. In cases (if they exist) JDO is not flexible/sufficient/whatever enough for my needs, to fall back to one of the easier APIs of pelops or hector? Is this possible? Is there an approach like in JPA to get an sql connection and fetch/put data?


After reading a bit on, I found following additional information:

Datanucleus-Cassandra-Plugin is based on the pelops, which also can be accessed for more flexibility, more performance (?), which should be used on the column families with a lot of data, JDO/JPA access should be only used on "administrative" data, where performance is not so important and data amount is not overwhelming.

Which still leaves the question open to start with hector or pelops.

pelops for it's later Datanucleus-Cassandra-Plugin extensibility, or hector for it's more sufficient support on node hanldling.

like image 850
andreas Avatar asked Mar 08 '11 11:03

andreas


People also ask

How to connect to Cassandra with Java?

In order to connect to Cassandra from Java, we need to build a Cluster object. An address of a node needs to be provided as a contact point. If we don't provide a port number, the default port (9042) will be used. These settings allow the driver to discover the current topology of a cluster.

Is Cassandra written in Java?

Cassandra is a distributed database management system which is open source with wide column store, NoSQL database to handle large amount of data across many commodity servers which provides high availability with no single point of failure. It is written in Java and developed by Apache Software Foundation.

How does Cassandra driver work?

The driver architecture is based on layers. At the bottom lies the driver core. This core handles everything related to the connections to a Cassandra cluster (for example, connection pool, discovering new nodes, etc.) and exposes a simple, relatively low-level API on top of which higher level layers can be built.


1 Answers

I tried most of these solutions and find hector the best. Even when you have some problem you can always reach people who wrote hector in #cassandra in freenode. and the code is more mature as far as I concern. In cassandra client the most critical part would be connection pooling management (since all the clients do mostly the same operations through thrift, but connection pooling is what makes high level client roll). In that case I would vote for hector since I am using it in production for over a year now with no visible problem (1 reconnect issue fixed as soon as I discovered and send an email about it).

I am still using cassandra 0.6 though.

like image 70
frail Avatar answered Oct 26 '22 09:10

frail