Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra NOT EQUAL Operator

Question to all Cassandra experts out there.

I have a column family with about a million records.

I would like to query these records in such a way that I should be able to perform a Not-Equal-To kind of operation.

I Googled on this and it seems I have to use some sort of Map-Reduce.

Can somebody tell me what are the options available in this regard.

like image 950
Babu James Avatar asked Feb 21 '14 04:02

Babu James


People also ask

How do you use not equal to in Cassandra Query?

NOT EQUALS " operator is used to search for content where the value of the specified field does not match the specified value. It cannot be used with text fields; see the DOES NOT CONTAIN (" !~ ") operator instead. Typing `field != value` is the same as typing `NOT field = value`.

How do you use NOT null in Cassandra?

Cassandra will not allow a part of a primary key to hold a null value. While Cassandra will allow you to create a secondary index on a column containing null values, it still won't allow you to query for those null values. Cassandra does not support the use of NOT or not equal to (!=) operators in the WHERE clause.

How do you use clause in Cassandra?

Using the SELECT command with the IN keyword. The IN keyword can define a set of clustering columns to fetch together, supporting a "multi-get" of CQL rows. A single clustering column can be defined if all preceding columns are defined for either equality or group inclusion.

Is null in Cassandra?

null fields don't exist in Cassandra unless you add them yourself. You might be thinking of the CQL data model, which hides certain implementation details in order to have a more understandable data model. Cassandra is sparse, which means that only data that is used is actually stored.


1 Answers

I can suggest a few approaches.

1) If you have a limited number of values that you would like to test for not-equality, consider modeling those as a boolean columns (i.e.: column isEqualToUnitedStates with true or false).

2) Otherwise, consider emulating the unsupported query != X by combining results of two separate queries, < X and > X on the client-side.

3) If your schema cannot support either type of query above, you may have to resort to writing custom routines that will do client-side filtering and construct the not-equal set dynamically. This will work if you can first narrow down your search space to manageable proportions, such that it's relatively cheap to run the query without the not-equal.

So let's say you're interested in all purchases of a particular customer of every product type except Widget. An ideal query could look something like SELECT * FROM purchases WHERE customer = 'Bob' AND item != 'Widget'; Now of course, you cannot run this, but in this case you should be able to run SELECT * FROM purchases WHERE customer = 'Bob' without wasting too many resources and filter item != 'Widget' in the client application.

4) Finally, if there is no way to restrict the data in a meaningful way before doing the scan (querying without the equality check would returning too many rows to handle comfortably), you may have to resort to MapReduce. This means running a distributed job that would scan all rows in the table across the cluster. Such jobs will obviously run a lot slower than native queries, and are quite complex to set up. If you want to go this way, please look into Cassandra Hadoop integration.

like image 75
Daniel S. Avatar answered Oct 04 '22 07:10

Daniel S.