What is the proper and fastest way to read Cassandra data into pandas? Now I use the following code but it's very slow...
import pandas as pd from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider from cassandra.query import dict_factory auth_provider = PlainTextAuthProvider(username=CASSANDRA_USER, password=CASSANDRA_PASS) cluster = Cluster(contact_points=[CASSANDRA_HOST], port=CASSANDRA_PORT, auth_provider=auth_provider) session = cluster.connect(CASSANDRA_DB) session.row_factory = dict_factory sql_query = "SELECT * FROM {}.{};".format(CASSANDRA_DB, CASSANDRA_TABLE) df = pd.DataFrame() for row in session.execute(sql_query): df = df.append(pd.DataFrame(row, index=[0])) df = df.reset_index(drop=True).fillna(pd.np.nan)
Reading 1000 rows takes 1 minute, and I have a "bit more"... If I run the same query eg. in DBeaver, I get the whole results (~40k rows) within a minute.
Thank you!!!
Fastest way to read Cassandra data into pandas with automatic iteration of pages. Create dictionary and add each to it by automatically iterating all pages. Then, create dataframe with this dictionary. Show activity on this post.
Cassandra has its own query language called Cassandra Query Language (CQL). CQL queries can be executed from inside a CQLASH shell – similar to MySQL or SQLite shell. The CQL syntax appears similar to standard SQL. Python module for working with Cassandra database is called Cassandra Driver.
Cassandra is a NoSQL database, which is a key-value store. Some of the features of Cassandra data model are as follows: Data in Cassandra is stored as a set of rows that are organized into tables. Tables are also called column families.
I got the answer at the official mailing list (it works perfectly):
Hi,
try to define your own pandas row factory:
def pandas_factory(colnames, rows): return pd.DataFrame(rows, columns=colnames) session.row_factory = pandas_factory session.default_fetch_size = None query = "SELECT ..." rslt = session.execute(query, timeout=None) df = rslt._current_rows
That's the way i do it - an it should be faster...
If you find a faster method - i'm interested in :)
Michael
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With