Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logging all queries with cassandra-python-driver

I'm trying to find a way to log all queries done on a Cassandra from a python code. Specifically logging as they're done executing using a BatchStatement

Are there any hooks or callbacks I can use to log this?

like image 560
dado_eyad Avatar asked Oct 16 '17 15:10

dado_eyad


People also ask

Can Cassandra be used with the python?

Python module for working with Cassandra database is called Cassandra Driver. It is also developed by Apache foundation. This module contains an ORM API, as well as a core API similar in nature to DB-API for relational databases. Installation of Cassandra driver is easily done using pip utility.

What is Cassandra driver?

A modern, feature-rich and highly tunable Node. js client library for Apache Cassandra and DSE using exclusively Cassandra's binary protocol and Cassandra Query Language.


2 Answers

2 options:

  1. Stick to session.add_request_init_listener

    From the source code:

    a) BoundStatement

    https://github.com/datastax/python-driver/blob/3.11.0/cassandra/query.py#L560

    The passed values are stored in raw_values, you can try to extract it

    b) BatchStatement

    https://github.com/datastax/python-driver/blob/3.11.0/cassandra/query.py#L676

    It stores all the statements and parameters used to construct this object in _statements_and_parameters. Seems it can be fetched although it’s not a public property

    c) Only this hook is called, I didn’t manage to find any other hooks https://github.com/datastax/python-driver/blob/master/cassandra/cluster.py#L2097

    But it has nothing to do with queries actual execution - it's just a way to inspect what kind of queries has been constructed and maybe add additional callbacks/errbacks

  2. Approach it from a different angle and use traces

    https://datastax.github.io/python-driver/faq.html#how-do-i-trace-a-request https://datastax.github.io/python-driver/api/cassandra/cluster.html#cassandra.cluster.ResponseFuture.get_all_query_traces

    Request tracing can be turned on for any request by setting trace=True in Session.execute_async(). View the results by waiting on the future, then ResponseFuture.get_query_trace()

Here's an example of BatchStatement tracing using option 2:

bs = BatchStatement()                                                        
bs.add_all(['insert into test.test(test_type, test_desc) values (%s, %s)',   
            'insert into test.test(test_type, test_desc) values (%s, %s)',   
            'delete from test.test where test_type=%s',
            'update test.test set test_desc=%s where test_type=%s'],
           [['hello1', 'hello1'],                                            
            ['hello2', 'hello2'],                                            
            ['hello2'],
            ['hello100', 'hello1']])     
res = session.execute(bs, trace=True)                                        
trace = res.get_query_trace()                                                
for event in trace.events:                                                   
    if event.description.startswith('Parsing'):                              
        print event.description 

It produces the following output:

Parsing insert into test.test(test_type, test_desc) values ('hello1', 'hello1')
Parsing insert into test.test(test_type, test_desc) values ('hello2', 'hello2')
Parsing delete from test.test where test_type='hello2'
Parsing update test.test set test_desc='hello100' where test_type='hello1'
like image 182
ffeast Avatar answered Oct 02 '22 03:10

ffeast


add_request_init_listener(fn, *args, **kwargs)

Adds a callback with arguments to be called when any request is created.

It will be invoked as fn(response_future, *args, **kwargs) after each client request is created, and before the request is sent*

Using the callback you can easily log all query made by that session.

Example :

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider


class RequestHandler:

    def on_request(self, rf):
        # This callback is invoked each time a request is created, on the thread creating the request.
        # We can use this to count events, or add callbacks
        print(rf.query)


auth_provider = PlainTextAuthProvider(
    username='cassandra',
    password='cassandra'
)

cluster = Cluster(['192.168.65.199'],auth_provider=auth_provider)
session = cluster.connect('test')

handler = RequestHandler()
# each instance will be registered with a session, and receive a callback for each request generated
session.add_request_init_listener(handler.on_request)

from time import sleep

for count in range(1, 10):
    print(count)
    for row in session.execute("select * from kv WHERE key = %s", ["ed1e49e0-266f-11e7-9d76-fd55504093c1"]):
        print row
    sleep(1)
like image 39
Ashraful Islam Avatar answered Oct 02 '22 03:10

Ashraful Islam