Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to resolve "cassandra.cluster.NoHostAvailable" in a Python multi threaded program

I am trying to insert records into Cassandra using a multi threaded Python program. I am running this program simultaneously on 3 machines. For some time being records are getting inserted, but later I am getting below exception. I am using driver provided by datastax.

cassandra.cluster.NoHostAvailable

I did some search and found out (source: https://datastax.github.io/python-driver/api/cassandra/cluster.html)

exception cassandra.cluster.NoHostAvailable
Raised when an operation is attempted but all connections are busy, defunct, closed, or resulted in errors when used.

My question is:
1. Is this a normal exception one could face with too many connection to Cassandra.
2. How would I resolve this in a situation where I want to create many connection/session to/of cassandra. (I know creating too many session is not advisable, it impact server performance as each session consume a handful amount of memory)

Below is the code fragment.

cluster = Cluster(['192.168.1.21'])
session = cluster.connect('myNameSpace')

def insertInToCassandra(catRange):
    for x in catRange:
        //function to insert records into Cassandra table

ProductRange = [
    range(900,920),
    range(921,940),
    range(941,960),
    range(961,980),
    range(981,1000)
     ]

# Make the Pool of workers
pool = ThreadPool(20)

# Open the urls in their own threads
# and return the results
results = pool.map(insertInToCassandra, ProductRange)

#close the pool and wait for the work to finish
pool.close()
pool.join()
like image 887
Rahul Vishwakarma Avatar asked Nov 20 '15 20:11

Rahul Vishwakarma


People also ask

Can Cassandra be used with Python?

Python module for working with Cassandra database is called Cassandra Driver. It is also developed by Apache foundation. This module contains an ORM API, as well as a core API similar in nature to DB-API for relational databases. Installation of Cassandra driver is easily done using pip utility.

What is Cassandra in Python?

Apache Cassandra is a column-family NoSQL data store designed for write-heavy persistent storage in Python web applications and data projects. Apache Cassandra is an implementation of the NoSQL database concept. Learn more in the data chapter or view the table of contents for all topics.

What is cluster in Cassandra?

A Cassandra cluster is a collection of nodes, or Cassandra instances, visualized as a ring. Cassandra clusters can be defined as “rack aware” or “datacenter aware” so that data replicas could be distributed in a way that could even survive physical outages of underlying infrastructure.


1 Answers

That's a normal exception that may occur if one or more cassandra nodes is unavailable, especially if it goes into a GC spin or otherwise crashes.

Depending on your replication factor (RF) and consistency level (CL), a single node going offline may or may not break the application (with RF of 3 and CL of quorum, any individual node failing should be no problem).

You should check the health of your cassandra cluster with nodetool status, and check /var/log/cassandra/system.log for signs of nodes flapping up/down.

like image 135
Jeff Jirsa Avatar answered Nov 01 '22 14:11

Jeff Jirsa