Python read Cassandra data into pandas

Tags:

What is the proper and fastest way to read Cassandra data into pandas? Now I use the following code but it's very slow...

import pandas as pd  from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider from cassandra.query import dict_factory  auth_provider = PlainTextAuthProvider(username=CASSANDRA_USER, password=CASSANDRA_PASS) cluster = Cluster(contact_points=[CASSANDRA_HOST], port=CASSANDRA_PORT,     auth_provider=auth_provider)  session = cluster.connect(CASSANDRA_DB) session.row_factory = dict_factory  sql_query = "SELECT * FROM {}.{};".format(CASSANDRA_DB, CASSANDRA_TABLE)  df = pd.DataFrame()  for row in session.execute(sql_query):     df = df.append(pd.DataFrame(row, index=[0]))  df = df.reset_index(drop=True).fillna(pd.np.nan)

Reading 1000 rows takes 1 minute, and I have a "bit more"... If I run the same query eg. in DBeaver, I get the whole results (~40k rows) within a minute.

Thank you!!!

256

asked Dec 20 '16 16:12

ragesz

1 Answers

I got the answer at the official mailing list (it works perfectly):

Hi,

try to define your own pandas row factory:
def pandas_factory(colnames, rows):     return pd.DataFrame(rows, columns=colnames)  session.row_factory = pandas_factory session.default_fetch_size = None  query = "SELECT ..." rslt = session.execute(query, timeout=None) df = rslt._current_rows 
That's the way i do it - an it should be faster...

If you find a faster method - i'm interested in :)

Michael

108

answered Sep 22 '22 08:09

ragesz

Related questions
                            
                                How to change the color of an AlertDialog message?
                            
                                What is the use of Map.ofEntries() instead of Map.of()
                            
                                How to verify installed spaCy version?
                            
                                Combine lists having a specific merge order in a pythonic way?
                            
                                Ansible: How to pip install with --upgrade
                            
                                Error while upgrading Mongodb from 3.2 to 3.6
                            
                                Docker swarm: 'build' configuration in docker compose file ignored during stack deployment
                            
                                How to listen for document changes in Cloud Firestore using Flutter?
                            
                                Can I change the color of Angular Material checkbox with some custom color and how?
                            
                                Why is Go json.Marshal rejecting these struct tags? What is proper syntax for json tags? [duplicate]
                            
                                How do you add local .jar file dependency to build.gradle.kt file?
                            
                                How to fix 'rimraf is not a recognized command' in Windows 10

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python read Cassandra data into pandas

Tags:

ragesz

People also ask

1 Answers

ragesz

Recent Activity

Donate For Us