Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra not returning all matching columns with a "SELECT x WHERE y IN (z)" query

I have the following column family in Cassandra:

CREATE TABLE item_index (
  foo_id text,
  bar_id text,
  bar_metadata text,
  bar_url text,
  PRIMARY KEY (foo_id, bar_id)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

I am trying to retrieve all the rows matching a list of 25 specific bar_id's using this query in cqlsh:

SELECT * FROM stagekeyspace.item_index WHERE foo_id IN ('1947dccc791ace5eb40dd2f00d9d876f') 
AND bar_id IN ('5f715d9f4a1a97b8fb54996bca5b0d91','64a8708d33b426315480b127a36663fd',
'b5f788e2c5b6e0bdfa3fc76d0f3e4fac','b6da801b86fd27e7382f5b6ce6dedf4d',
'c2e6586c0c9867157a4789a2ba3fb3c1','dc784b35764c6a3fbf083a6da59ad475',
'09c436ce046905b018a1e3fa417ee04a','0b4b5bd9c339353f6c16fcd822f50d6b',
'0c8e2e54c4629767b548830e122f295b','106735c6a97f8c8b006b9e0dbe18585e',
'11135d45b78086bc386bd9e538409915','269e2fbb4ce98f74471ec2fd0fffaa7a',
'2d2462fada062a160e60d537ab58ef81','30c617bd0dc41fde670c3796ad98ff65',
'32b458cae9762541a64e5f60b29f064c','36884ff28272675befb800eccc49b691',
'3b7f30db8c6594c5ec677d3465d47735','4054aedd79682a798d862e431be27636',
'4bee7a9cc7fd74a55e640ef6aca3864c','578e241dad54248261c341526563448b',
'59283ec34b7faa9db1c0befa38e34ea2','64a0fdc8019b32a5768900c6c30a6e66',
'65a767a4e7df8a06701c806f417c1de7','6719ab0291205b374cd50577f0a16ad7',
'6bde8f55ddb7764138d4937dfaf85795');

This returns 21 "rows" even though it should return 25 rows, one for each bar_id.

To confirm that the data is there for all 25 bar_id's, I ran the following query, using one bar_id at a time:

SELECT * FROM stagekeyspace.item_index WHERE foo_id IN 
('1947dccc791ace5eb40dd2f00d9d876f') AND bar_id IN ('5f715d9f4a1a97b8fb54996bca5b0d91');

All 25 queries return one row with the correct data.

Any pointers as to why the first query is not returning all the results it should?

I have other queries in the system that use a SELECT .. WHERE x in ([LIST]) with over 100 items in the LIST and they run without issues. Also, each data "row" is less than 1kb is size.

All of this is running on an Ubuntu image with:

[cqlsh 3.1.6 | Cassandra 1.2.8 | CQL spec 3.0.0 | Thrift protocol 19.36.0]

Edit: Adding a trace as requested. Please note that some the column data has changed and the results are now down to 18 instead of 25:

Tracing session: 4245fd80-a457-11e3-9933-19b599adc7ff

 activity                                             | timestamp    | source       | source_elapsed
------------------------------------------------------+--------------+--------------+----------------
                                   execute_cql3_query | 11:14:03,225 | 10.144.3.175 |              0
 Parsing SELECT count(*) FROM stagekeyspace.image_index WHERE domain_id 
 IN ('1947dccc791ace5eb40dd2f00d9d876f') 
 AND image_id IN ('64a8708d33b426315480b127a36663fd', 
 .., '6bde8f55ddb7764138d4937dfaf85795') LIMIT 10000; | 11:14:03,225 | 10.144.3.175 |             47
      Executing single-partition query on image_index | 11:14:03,226 | 10.144.3.175 |            733
                                   Peparing statement | 11:14:03,225 | 10.144.3.175 |            172
                         Acquiring sstable references | 11:14:03,226 | 10.144.3.175 |            750
                          Merging memtable tombstones | 11:14:03,226 | 10.144.3.175 |            776
                         Key cache hit for sstable 76 | 11:14:03,226 | 10.144.3.175 |            945
    Seeking to partition indexed section in data file | 11:14:03,226 | 10.144.3.175 |            959
                         Key cache hit for sstable 75 | 11:14:03,226 | 10.144.3.175 |           1061
    Seeking to partition indexed section in data file | 11:14:03,226 | 10.144.3.175 |           1071
                         Key cache hit for sstable 74 | 11:14:03,226 | 10.144.3.175 |           1263
    Seeking to partition indexed section in data file | 11:14:03,226 | 10.144.3.175 |           1277
                         Key cache hit for sstable 73 | 11:14:03,226 | 10.144.3.175 |           1421
    Seeking to partition indexed section in data file | 11:14:03,226 | 10.144.3.175 |           1433
           Merging data from memtables and 4 sstables | 11:14:03,226 | 10.144.3.175 |           1459
                  Read 18 live and 0 tombstoned cells | 11:14:03,230 | 10.144.3.175 |           5298
                                     Request complete | 11:14:03,230 | 10.144.3.175 |           5845
like image 343
Zack Avatar asked Nov 02 '22 04:11

Zack


1 Answers

Cassandra's a bit of lazy doing full table scans: does appending 'ALLOW FILTERING' solve the issue? E.g.

SELECT * FROM stagekeyspace.item_index WHERE ... ALLOW FILTERING;
like image 140
Daniel Schulz Avatar answered Nov 09 '22 01:11

Daniel Schulz