Cassandra Full-Text Search

Tags:

Full-Text search in Cassandra;

I am fairly new to Cassandra, and wish to understand it more properly. I am attempting to perform a Full-Text search in Cassandra, but after some research I have found that there may not be a "simple" approach for this.. and I say maybe because the first page of Google hasn't said much of anything.

So I am trying to understand now instead, what is the best approach here.. This sort of lead me to take make up my own assumptions based on what I've learned so far about Cassandra, that is based on these two principals; a) design your tables based on your queries, rather than the data, and b) more-data is a good thing, as long as it is being used properly.

With that being said, I've come up with a couple of solutions I'd like to share, and also ask that if anyone has a better idea, please fill me on it before I commit to anything unreasonable/naive.

First Solution: Create a Column Family(CF), with two primary keys and an Index like so:

CREATE TABLE "FullTextSearch" (
"PartialText" text,
"TargetIdentifier" uuid,
"CompleteText" text,
"Type" int,
PRIMARY KEY ("PartialText","TargetIdentifier")
);
CREATE INDEX IX_FullTextSearch_Type "keyspace"."FullTextSearch" ("Type");

With the above table, I would need to insert rows for the text "Hello World" as follows:

BATCH APPLY;
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("H",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("He",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Hel",000000000-0000-0000-0000-000000000,"Hello World",1);
.....
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Hello Wor",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Hello Worl",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Hello World",000000000-0000-0000-0000-000000000,"Hello World",1);
.....
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Wor",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Worl",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("World",000000000-0000-0000-0000-000000000,"Hello World",1);
END BATCH;

Basically, the above will satisfy the following wildcards/partialtext "%o W%", "Hello%", "Worl%"; However it will not satisfy partial words such as "%ell%" for "Hello", which I can feel alright about for now..... (OCD sorta kicks in here)

This approach sort of sucks for me because I would now have to delete/re-insert any time a save/name change occurs on the "TargetIdentifier";

The Second Solution, would be very similar only this time making use of wide-columns; where the table might look like:

CREATE TABLE "FullTextSearch" (
"TargetIdentifier" uuid,
"Type" int,
"CompleteText" text,
PRIMARY KEY("TargetIdentifier")
);

and now during a search something like:

SELECT * FROM "FullTextSearch" WHERE "He" = 1;

so that if the column exists, the respective rows are returned;

Third Solution: similar to the one above, only this time instead of using wide-columns we use a set column such as map for the partial texts, and perform a query like:

SELECT * FROM "FullTextSearch" WHERE "PartialTexts"['He'] = 1;

Anyways, I am all out of ideas, it is late, and I can only hope for a great response! Please, let me know what I should be doing here... am I even on the right path?

452

asked Jul 21 '14 05:07

user1953264

1 Answers

AFAIK Datastax Enterprise Search is the (commercial) successor of Solandra.

Cassandra 2.0 supports so called "custom secondary indexes". Custom secondary indexes are Java code. Your own implementation has to implement the abstract class org.apache.cassandra.db.index.SecondaryIndex (See http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/create_index_r.html)

I'm not sure whether implementations exist for Elasticsearch or Solr.

I would not recommend to code all the weird full text search logic like stemming, multiple/exotic language support or even geo spatial stuff.

But SecondaryIndexwould be a good point to start integrating your favorite search engine.

171

answered Sep 19 '22 03:09

Robert Stupp

Related questions
                            
                                Cassandra or SOLR? What gives better performance to frond end read queries?
                            
                                Cassandra Or MongoDB For Our Location Based Application
                            
                                How do you insert a string or text as a blob in Cassandra (specifically CQLSH)?
                            
                                Disk Space not freed up even after deleting keyspace from cassandra db and compaction
                            
                                Django with NoSQL database
                            
                                Analytics and Mining of data sitting on Cassandra
                            
                                How to reset a lost Cassandra admin user's password?
                            
                                Difference between Codahale metrics and Dropwizard metrics
                            
                                How to SELECT DISTINCT in cassandra
                            
                                Was cqlsh 5.0.1 broken in cassandra 3.11.2 release?
                            
                                SELECT DISTINCT cql ignores WHERE clause
                            
                                High Level Java Client selection for Apache Cassandra [closed]
                            
                                Re-using PreparedStatement when using Datastax Cassandra Driver?
                            
                                Querying Cassandra by a partial partition key
                            
                                Operation Time Out Error in cqlsh console of cassandra
                            
                                how Cassandra chooses the coordinator node and the replication nodes?
                            
                                Is there a way to discover Cassandra CQL table structure?
                            
                                Can't start Cassandra after OS patch up
                            
                                Prettifying results of cqlsh commands in Linux terminal
                            
                                Upgrading Cassandra from 2.2 to 3.0 in RHEL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cassandra Full-Text Search

Tags:

full-text-search

cassandra

cql

user1953264

People also ask

1 Answers

Robert Stupp

Recent Activity

Donate For Us