Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Cassandra Table from pyspark DataFrame

I'm using Apache Spark 2.2.1 with Cassandra 3.11 and Datastax spark-cassandra-connector from python/pyspark.

And I would like to create Cassandra Table from dataset structure. So, I found a function createCassandraTable within DataSetFunction package in Java, but I am not able to find correspondence with pyspark package. This is a similar question in Java.

I am trying something like this:

dataset.createCassandraTable('examples', 'table_example', partitionKeyColumns = ['id'])

but createCassandraTable is not a method of dataset/dataframe.

I know that I could use raw CQL create table in spark, however I would like to do so dynamic and programmatically. Although, It is an alternative, maybe with a mapping between spark and cassandra types.

Any experience here or new ideas? Spark SQL alternative?

like image 243
Juan Antonio Aguilar Avatar asked Jan 27 '26 12:01

Juan Antonio Aguilar


1 Answers

I am also facing the same issue.

But there is a way I think it can be possible. i.e using some driver lets say cassandra-driver for python.

We can collect the require column fields from the data frame using available methods and can create the table programatically at run time.

After that, we may store the data using the following code,

df.write.format("org.apache.spark.sql.cassandra").mode('append').options(table=".options(table="kv", keyspace="test")kv", keyspace="test").save()
like image 84
P K Lenka Avatar answered Jan 29 '26 03:01

P K Lenka



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!