I am trying to insert records into Cassandra using a multi threaded Python program. I am running this program simultaneously on 3 machines. For some time being records are getting inserted, but later I am getting below exception. I am using driver provided by datastax.
cassandra.cluster.NoHostAvailable
I did some search and found out (source: https://datastax.github.io/python-driver/api/cassandra/cluster.html)
exception cassandra.cluster.NoHostAvailable
Raised when an operation is attempted but all connections are busy, defunct, closed, or resulted in errors when used.
My question is:
1. Is this a normal exception one could face with too many connection to Cassandra.
2. How would I resolve this in a situation where I want to create many connection/session to/of cassandra. (I know creating too many session is not advisable, it impact server performance as each session consume a handful amount of memory)
Below is the code fragment.
cluster = Cluster(['192.168.1.21'])
session = cluster.connect('myNameSpace')
def insertInToCassandra(catRange):
for x in catRange:
//function to insert records into Cassandra table
ProductRange = [
range(900,920),
range(921,940),
range(941,960),
range(961,980),
range(981,1000)
]
# Make the Pool of workers
pool = ThreadPool(20)
# Open the urls in their own threads
# and return the results
results = pool.map(insertInToCassandra, ProductRange)
#close the pool and wait for the work to finish
pool.close()
pool.join()
Python module for working with Cassandra database is called Cassandra Driver. It is also developed by Apache foundation. This module contains an ORM API, as well as a core API similar in nature to DB-API for relational databases. Installation of Cassandra driver is easily done using pip utility.
Apache Cassandra is a column-family NoSQL data store designed for write-heavy persistent storage in Python web applications and data projects. Apache Cassandra is an implementation of the NoSQL database concept. Learn more in the data chapter or view the table of contents for all topics.
A Cassandra cluster is a collection of nodes, or Cassandra instances, visualized as a ring. Cassandra clusters can be defined as “rack aware” or “datacenter aware” so that data replicas could be distributed in a way that could even survive physical outages of underlying infrastructure.
That's a normal exception that may occur if one or more cassandra nodes is unavailable, especially if it goes into a GC spin or otherwise crashes.
Depending on your replication factor (RF) and consistency level (CL), a single node going offline may or may not break the application (with RF of 3 and CL of quorum, any individual node failing should be no problem).
You should check the health of your cassandra cluster with nodetool status
, and check /var/log/cassandra/system.log
for signs of nodes flapping up/down.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With