I started using SASI indexing and used the following setup,
CREATE TABLE employee (
id int,
lastname text,
firstname text,
dateofbirth date,
PRIMARY KEY (id, lastname, firstname)
) WITH CLUSTERING ORDER BY (lastname ASC, firstname ASC));
CREATE CUSTOM INDEX employee_firstname_idx ON employee (firstname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false'};
I perform the following query,
SELECT * FROM employee WHERE firstname like '%s';
As per my study, It seems the same as normal secondary indexing in Cassandra, Except providing the LIKE search,
1) Could somebody explain how it differs from normal secondary index in Cassandra?
2) What are the best configurations like mode, analyzer_class and case_sensitive - Any recommended documentation for this?
1) Could somebody explain how it differs from normal secondary index in Cassandra?
Normal secondary index is essentially another lookup table comprising secondary index columns & primary key. Hence it has its own set of sstable files (disk), memtable (memory) and write overhead (cpu).
SASI was an improvement open sourced (contributed by Apple) to Cassandra community. This index gets created for every SSTable being flushed to disk and doesn't maintain a separate table. Hence less disk usage, no separate memtable/bloom filter/partition index (less memory) and minimal overhead.
2) What are the best configurations like mode, analyzer_class and case_sensitive - Any recommended documentation for this?
Configuration depends on your use case :-
Essentially there are three modes
Analyzer_class : Analyzers can be specified that will analyze the text in the specified column.
case_sensitive : As name implies, whether the indexed column should be searched case insensitive. Applicable values are
Detailed documentation reference here and detailed blog post on performance.
Here is a short summary of SASI from https://github.com/scylladb/scylla/wiki/Indexing-in-Cassandra-3:
SASI (acroynym of "SStable-Attached Secondary Indexing") is a reimplementation of the classic Cassandra secondary indexing with one main goal in mind - efficiently support more sophisticated search queries such as:
Some of these things were already possible with secondary index, but inefficient, because required getting a long list of partitions, reading them (requiring inefficient seeks to each one) and filtering on them. SASI implement them using a new on-disk format based on B+ trees, and does not reuse regular Cassandra column families or sstables like the classic Secondary Indexing method did.
SASI attaches to each sstable its own immutable index file (and hence the name of this method), and also attaches an index to each memtable. During compaction, the indexes of the files being compacted together are also compacted to create one new index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With