Are there any distinct advantages for using cql over thrift or is it simply a case of developers being too used to SQL? I'm wanting to switch from thrift querying to cql, the only problem is I'm not sure about the downsides of doing so. What are they?
Thrift is actually an RPC protocol or API unified with a code generation tool for CQL, and the purpose of using thrift in Cassandra is because it facilitates easy access to Database (DB), across the Programming Language.
CQL is a Health Level Seven International® (HL7®) authoring language standard that's intended to be human readable. It is part of the effort to harmonize standards used for electronic clinical quality measures (eCQMs) and clinical decision support (CDS).
No joins. You cannot perform joins in Cassandra. If you have designed a data model and find that you need something like a join, you'll have to either do the work on the client side, or create a denormalized second table that represents the join results for you.
Lyuben's answer is a good one, but I believe he may be misinformed on a few points. First, you should be aware that the Thrift API is not going to be getting new features; it's there for backwards compatibility, and not recommended for new projects. There are already some features that can not be used through the Thrift interface.
Another factor is that the quoted benchmarks from Acunu are misleading; they don't measure the performance of CQL with prepared statements. See, for example, the graphs at https://issues.apache.org/jira/browse/CASSANDRA-3634 (probably the same data set on which the Acunu post is based, since Eric Evans wrote both). There have also been some improvements to CQL parsing and execution speed in the last year. It is not likely that you will observe any real speed difference between CQL 3 and Thrift.
Finally, I don't think I even agree that Thrift is more flexible. The CQL 3 datamodel allows using the same data structures that Thrift does for nearly all usages that are not antipatterns; it just allows you to think about the model in a more organized way. For example, Lyuben mentioned rows with differing numbers of columns. A CQL 3 table may still utilize that capability: there is a difference between "storage engine rows" (which is Cassandra's low level storage, and what Thrift uses directly) and "CQL rows" (what you see through the Thrift interface). CQL just does the extra work necessary to visualize wide storage engine rows as structured tables.
It's a little difficult to explain in a quick SO answer, but see this post for a somewhat gentle explanation.
Querying
In CQL you can query cassandra and get data in a couple of lines (using JDBC driver):
String query = "SELECT * FROM message;";
PreparedStatement statement = con.prepareStatement(query);
While in thrift based API's it's a bit more complicated (example with Astyanax):
OperationResult<ColumnList<String>> result =
keyspace.prepareQuery(mail/*specify columnfamily structure*/)
.getKey("lyuben@1363115059").execute();
ColumnList<String> columns = result.getResult();
Performance
Based on the benchmarks carried out by Acunu, Thrift (RPC) is slightly ahead of CQL when it comes to query performance, but you need to be in a situation where high throughput is key for this performance advantage to have a significant benefit.
Some excellent articles to lookup are:
EDIT
The above benchmarks are outdated, the paul provided newer benchmarks on prepared statements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With