I have the following column family in Cassandra:
CREATE TABLE item_index (
foo_id text,
bar_id text,
bar_metadata text,
bar_url text,
PRIMARY KEY (foo_id, bar_id)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
I am trying to retrieve all the rows matching a list of 25 specific bar_id's using this query in cqlsh:
SELECT * FROM stagekeyspace.item_index WHERE foo_id IN ('1947dccc791ace5eb40dd2f00d9d876f')
AND bar_id IN ('5f715d9f4a1a97b8fb54996bca5b0d91','64a8708d33b426315480b127a36663fd',
'b5f788e2c5b6e0bdfa3fc76d0f3e4fac','b6da801b86fd27e7382f5b6ce6dedf4d',
'c2e6586c0c9867157a4789a2ba3fb3c1','dc784b35764c6a3fbf083a6da59ad475',
'09c436ce046905b018a1e3fa417ee04a','0b4b5bd9c339353f6c16fcd822f50d6b',
'0c8e2e54c4629767b548830e122f295b','106735c6a97f8c8b006b9e0dbe18585e',
'11135d45b78086bc386bd9e538409915','269e2fbb4ce98f74471ec2fd0fffaa7a',
'2d2462fada062a160e60d537ab58ef81','30c617bd0dc41fde670c3796ad98ff65',
'32b458cae9762541a64e5f60b29f064c','36884ff28272675befb800eccc49b691',
'3b7f30db8c6594c5ec677d3465d47735','4054aedd79682a798d862e431be27636',
'4bee7a9cc7fd74a55e640ef6aca3864c','578e241dad54248261c341526563448b',
'59283ec34b7faa9db1c0befa38e34ea2','64a0fdc8019b32a5768900c6c30a6e66',
'65a767a4e7df8a06701c806f417c1de7','6719ab0291205b374cd50577f0a16ad7',
'6bde8f55ddb7764138d4937dfaf85795');
This returns 21 "rows" even though it should return 25 rows, one for each bar_id.
To confirm that the data is there for all 25 bar_id's, I ran the following query, using one bar_id at a time:
SELECT * FROM stagekeyspace.item_index WHERE foo_id IN
('1947dccc791ace5eb40dd2f00d9d876f') AND bar_id IN ('5f715d9f4a1a97b8fb54996bca5b0d91');
All 25 queries return one row with the correct data.
Any pointers as to why the first query is not returning all the results it should?
I have other queries in the system that use a SELECT .. WHERE x in ([LIST]) with over 100 items in the LIST and they run without issues. Also, each data "row" is less than 1kb is size.
All of this is running on an Ubuntu image with:
[cqlsh 3.1.6 | Cassandra 1.2.8 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Edit: Adding a trace as requested. Please note that some the column data has changed and the results are now down to 18 instead of 25:
Tracing session: 4245fd80-a457-11e3-9933-19b599adc7ff
activity | timestamp | source | source_elapsed
------------------------------------------------------+--------------+--------------+----------------
execute_cql3_query | 11:14:03,225 | 10.144.3.175 | 0
Parsing SELECT count(*) FROM stagekeyspace.image_index WHERE domain_id
IN ('1947dccc791ace5eb40dd2f00d9d876f')
AND image_id IN ('64a8708d33b426315480b127a36663fd',
.., '6bde8f55ddb7764138d4937dfaf85795') LIMIT 10000; | 11:14:03,225 | 10.144.3.175 | 47
Executing single-partition query on image_index | 11:14:03,226 | 10.144.3.175 | 733
Peparing statement | 11:14:03,225 | 10.144.3.175 | 172
Acquiring sstable references | 11:14:03,226 | 10.144.3.175 | 750
Merging memtable tombstones | 11:14:03,226 | 10.144.3.175 | 776
Key cache hit for sstable 76 | 11:14:03,226 | 10.144.3.175 | 945
Seeking to partition indexed section in data file | 11:14:03,226 | 10.144.3.175 | 959
Key cache hit for sstable 75 | 11:14:03,226 | 10.144.3.175 | 1061
Seeking to partition indexed section in data file | 11:14:03,226 | 10.144.3.175 | 1071
Key cache hit for sstable 74 | 11:14:03,226 | 10.144.3.175 | 1263
Seeking to partition indexed section in data file | 11:14:03,226 | 10.144.3.175 | 1277
Key cache hit for sstable 73 | 11:14:03,226 | 10.144.3.175 | 1421
Seeking to partition indexed section in data file | 11:14:03,226 | 10.144.3.175 | 1433
Merging data from memtables and 4 sstables | 11:14:03,226 | 10.144.3.175 | 1459
Read 18 live and 0 tombstoned cells | 11:14:03,230 | 10.144.3.175 | 5298
Request complete | 11:14:03,230 | 10.144.3.175 | 5845
Cassandra's a bit of lazy doing full table scans: does appending 'ALLOW FILTERING' solve the issue? E.g.
SELECT * FROM stagekeyspace.item_index WHERE ... ALLOW FILTERING;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With