Cassandra nodes cannot communicate with each other, cause ReadTimeout

Tags:

This is on Datastax Cassandra (dse) version: 4.8.5-1
This corresponds (I believe) to Cassandra: 2.1.x

I'm getting a lot of the following errors when querying from our application:

ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'data_retrieved': False, 'required_responses': 1, 'consistency': 1}

Digging into this more; a sample query (run using cqlsh locally on each node) returns on 3 of the nodes in the ring but fails with a ReadTimeout on the rest. It seems like only the nodes containing the replicas return with a response, while the rest don't know how to find them at all.

Is there some configuration or known issue I should be looking at to fix this issue?

When the other nodes fail, I see this error in the logs:

ERROR [MessagingService-Outgoing-/10.0.10.14] 2016-04-25 20:46:46,818  CassandraDaemon.java:229 - Exception in thread Thread[MessagingService-Outgoing-/10.0.10.14,5,
main]
java.lang.AssertionError: 371205
        at org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:290) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:393) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:382) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:271) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:259) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:503) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:490) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.SliceFromReadCommandSerializer.serialize(SliceFromReadCommand.java:168) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:143) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:132) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:121) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:330) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:282) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
        at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218) ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

Nodetool status output

Datacenter: primary
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns    Host ID                               Rack
UN  10.0.10.224  557.95 GB  1       ?       d1b984b0-50d4-4faa-b349-08bc0cf36447  RAC1
UN  10.0.10.225  740.11 GB  1       ?       16ab3c8c-476e-46c2-837c-6dbb89b7d40d  RAC1
UN  10.0.10.12   748.23 GB  1       ?       4127f0d7-6bd0-4dc8-b6a0-3b261e55b44e  RAC1
UN  10.0.10.45   629.27 GB  1       ?       f4499c5d-f892-43b8-97f3-dcce5be51fb8  RAC2
UN  10.0.10.13   592.57 GB  1       ?       41b58044-942d-4e77-a8de-95495b88a073  RAC1
UN  10.0.10.14   616.45 GB  1       ?       d2b568fb-13e1-4ff7-a247-3751a8ca49cf  RAC1
UN  10.0.10.15   623.23 GB  1       ?       fb10e521-8359-409b-bfd8-b27829157a80  RAC1
UN  10.0.10.21   538.56 GB  1       ?       72288b4c-bd1d-4398-9d95-5af312c2f904  RAC2
UN  10.0.10.25   616.63 GB  1       ?       4a8f04ff-a198-44d1-baf4-72cc430cd8a9  RAC2
UN  10.0.10.218  562.98 GB  1       ?       c00c375d-90bb-48c5-a8d0-7102a13db468  RAC2
UN  10.0.10.219  632.58 GB  1       ?       1e2ea144-35bd-412b-89b5-41544a347a75  RAC2
UN  10.0.10.220  746.85 GB  1       ?       d40f59c1-430a-4d96-9d7e-1e846b8eb1fc  RAC2
UN  10.0.10.221  575.89 GB  1       ?       7e407d6b-2bd5-43b4-9116-96ee72a926b2  RAC2
UN  10.0.10.222  639.98 GB  1       ?       bfd04ab8-7679-4474-8d47-984950bdd2c7  RAC1
UN  10.0.10.223  652.58 GB  1       ?       6366cd3e-7910-40bb-8a12-926c53adf95b  RAC1

The code for this assertion is here:

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.1.1/org/apache/cassandra/utils/ByteBufferUtil.java?av=f#290

There's no obvious schema mismatch when looking at either the system.local or system.peers tables.
nodetool describecluster returns UNREACHABLE from some nodes

559

asked Apr 25 '16 19:04

c4urself

1 Answers

You are probably hitting the 64K max key size limit, http://wiki.apache.org/cassandra/FAQ#max_key_size

Look for your application code, probably somebody sending cassandra 371205 byte long data as a primary key, maybe somebody trying to crack your application i don't know, because highly unlikely 370k data as primary key is sensible, restrict this in your application code,

I don't know if any bug or fix or workaround exists about this.

131

answered Sep 28 '22 01:09

hll

Related questions
                            
                                Uncaught type error when loading TextGeometry font
                            
                                Hyperlink to Outlook Attachment
                            
                                Qt build both release and debug libraries
                            
                                Symfony2 functional test prints out redirect html and stops test execution
                            
                                How many custom events can I track with Fabric Answers?
                            
                                zip in IronPython 2.7 and Python3.5
                            
                                Reset an open serial port
                            
                                Trigger storyboard on Button click?
                            
                                React Native - Building a reliable source map
                            
                                Pycharm failing to resolve django urls
                            
                                Is it possible to have an array of instances which take a generic parameter without knowing (or caring) what the parameter is?
                            
                                CUDA cufft 2D example

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cassandra nodes cannot communicate with each other, cause ReadTimeout

Tags:

c4urself

People also ask

1 Answers

hll

Recent Activity

Donate For Us