Duplicate rows/columns for the same primary key in Cassandra

Tags:

cql

I have a table/columnfamily in Cassandra 3.7 with sensordata.

CREATE TABLE test.sensor_data (
    house_id int,
    sensor_id int,
    time_bucket int,
    sensor_time timestamp,
    sensor_reading map<int, float>,
    PRIMARY KEY ((house_id, sensor_id, time_bucket), sensor_time)
)

Now when I select from this table I find duplicates for the same primary key, something I thought was impossible.

cqlsh:test> select * from sensor_data;

 house_id | sensor_id | time_bucket | sensor_time                     | sensor_reading
----------+-----------+-------------+---------------------------------+----------------
        1 |         2 |           3 | 2016-01-02 03:04:05.000000+0000 |       {1: 101}
        1 |         2 |           3 | 2016-01-02 03:04:05.000000+0000 |       {1: 101}

I think part of the problem is that this data has both been written "live" using java and Datastax java driver, and it has been loaded together with historic data from another source using sstableloader.

Regardless, this shouldn't be possible. I have no way of connecting with the legacy cassandra-cli to this cluster, perhaps that would have told me something that I can't see using cqlsh.

So, the questions are:
* Is there anyway this could happen under known circumstances?
* Can I read more raw data using cqlsh? Specifically write time of these two rows. the writetime()-function can't operate on primary keys or collections, and that is all I have.

Thanks.

Update:

This is what I've tried, from comments, answers and other sources
* selecting using blobAsBigInt gives the same big integer for all identical rows
* connecting using cassandra-cli, after enabling thrift, is possible but reading the table isn't. It's not supported after 3.x
* dumping out using sstabledump is ongoing but expected to take another week or two ;)

730

asked Oct 01 '16 06:10

Andreas Wederbrand

1 Answers

I don't expect to see nanoseconds in a timestamp field and additionally i'm of the impression they're fully not supported? Try this:

SELECT house_id, sensor_id, time_bucket, blobAsBigint(sensor_time) FROM test.sensor_data;

I WAS able to replicate it doing by inserting the rows via an integer:

INSERT INTO sensor_data(house_id, sensor_id, time_bucket, sensor_time) VALUES (1,2,4,1451692800000); INSERT INTO sensor_data(house_id, sensor_id, time_bucket, sensor_time) VALUES (1,2,4,1451692800001);

This makes sense because I would suspect one of your drivers is using a bigint to insert the timestamp, and one is likely actually using the datetime.

Tried playing with both timezones and bigints to reproduce this... seems like only bigint is reproducable

 house_id | sensor_id | time_bucket | sensor_time              | sensor_reading
----------+-----------+-------------+--------------------------+----------------
        1 |         2 |           3 | 2016-01-02 00:00:00+0000 |           null
        1 |         2 |           4 | 2016-01-01 23:00:00+0000 |           null
        1 |         2 |           4 | 2016-01-02 00:00:00+0000 |           null
        1 |         2 |           4 | 2016-01-02 00:00:00+0000 |           null
        1 |         2 |           4 | 2016-01-02 01:01:00+0000 |           null

edit: Tried some shenanigans using bigint in place of datetime insert, managed to reproduce...

129

answered Oct 17 '22 05:10

Highstead

Related questions
                            
                                Cassandra Native Binary Protocol Performance
                            
                                Cassandra 1.2 inserting/updating a blob column type using Python and the cql library
                            
                                Proper cassandra keyspace restore procedure
                            
                                Supporting logical delete for an existing feed table
                            
                                Mapping Joda time to Cassandra's Date type
                            
                                cassandra Read performance with Collection
                            
                                How to handle a change in denormalized data
                            
                                datastax opscenter agents not installing
                            
                                Performing Full Repair after Migrating to Incremental Repair on Cassandra Nodes
                            
                                Cassandra Drop Keyspace Snapshot Cleaning
                            
                                Datamodel for Property Graph over HBase/Cassandra
                            
                                Error connecting to remote JMX agent! while starting Nodetool
                            
                                How to store an image as a Bytes field in cassandra database using cql engine?
                            
                                Import .csv file into cassandra
                            
                                How to delete a large number of rows in Cassandra (and avoid potential tombstone issues)?
                            
                                Huge Leaderboard ranking with filtering
                            
                                Spark JoinWithCassandraTable on TimeStamp partition key STUCK
                            
                                Golang client for Cassandra [closed]
                            
                                Cassandra read error
                            
                                Cassandra 2.2.3 repair exception

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Duplicate rows/columns for the same primary key in Cassandra

Tags:

cassandra

cql

Andreas Wederbrand

People also ask

1 Answers

Highstead

Recent Activity

Donate For Us