I'm using Apache Spark 2.2.1 with Cassandra 3.11 and Datastax spark-cassandra-connector from python/pyspark.
And I would like to create Cassandra Table from dataset structure. So, I found a function createCassandraTable within DataSetFunction package in Java, but I am not able to find correspondence with pyspark package. This is a similar question in Java.
I am trying something like this:
dataset.createCassandraTable('examples', 'table_example', partitionKeyColumns = ['id'])
but createCassandraTable is not a method of dataset/dataframe.
I know that I could use raw CQL create table in spark, however I would like to do so dynamic and programmatically. Although, It is an alternative, maybe with a mapping between spark and cassandra types.
Any experience here or new ideas? Spark SQL alternative?
We can collect the require column fields from the data frame using available methods and can create the table programatically at run time.
After that, we may store the data using the following code,
df.write.format("org.apache.spark.sql.cassandra").mode('append').options(table=".options(table="kv", keyspace="test")kv", keyspace="test").save()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With