Create Cassandra Table from pyspark DataFrame

Question

I'm using Apache Spark 2.2.1 with Cassandra 3.11 and Datastax spark-cassandra-connector from python/pyspark.

And I would like to create Cassandra Table from dataset structure. So, I found a function createCassandraTable within DataSetFunction package in Java, but I am not able to find correspondence with pyspark package. This is a similar question in Java.

I am trying something like this:

dataset.createCassandraTable('examples', 'table_example', partitionKeyColumns = ['id'])

but createCassandraTable is not a method of dataset/dataframe.

I know that I could use raw CQL create table in spark, however I would like to do so dynamic and programmatically. Although, It is an alternative, maybe with a mapping between spark and cassandra types.

Any experience here or new ideas? Spark SQL alternative?

P K Lenka · Accepted Answer

I am also facing the same issue.

But there is a way I think it can be possible. i.e using some driver lets say cassandra-driver for python.

We can collect the require column fields from the data frame using available methods and can create the table programatically at run time.

After that, we may store the data using the following code,

df.write.format("org.apache.spark.sql.cassandra").mode('append').options(table=".options(table="kv", keyspace="test")kv", keyspace="test").save()

Create Cassandra Table from pyspark DataFrame

Tags:

cassandra

apache-spark

pyspark

spark-cassandra-connector

cassandra-3.0

Juan Antonio Aguilar

1 Answers

P K Lenka

Recent Activity

Donate For Us

Create Cassandra Table from pyspark DataFrame

Tags:

cassandra

apache-spark

pyspark

spark-cassandra-connector

cassandra-3.0

Juan Antonio Aguilar

1 Answers

P K Lenka

Related questions

Recent Activity

Donate For Us