Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write streaming Dataset to Cassandra?

So I have a Python Stream-sourced DataFrame df that has all the data I want to place into a Cassandra table with the spark-cassandra-connector. I've tried doing this in two ways:

df.write \
    .format("org.apache.spark.sql.cassandra") \
    .mode('append') \
    .options(table="myTable",keyspace="myKeySpace") \
    .save() 

query = df.writeStream \
    .format("org.apache.spark.sql.cassandra") \
    .outputMode('append') \
    .options(table="myTable",keyspace="myKeySpace") \
    .start()

query.awaitTermination()

However I keep on getting this errors, respectively:

pyspark.sql.utils.AnalysisException: "'write' can not be called on streaming Dataset/DataFrame;

and

java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.cassandra does not support streamed writing.

Is there anyway I can send my Streamed DataFrame into a my Cassandra Table?

like image 221
user2361174 Avatar asked Jul 15 '17 01:07

user2361174


1 Answers

There is currently no streaming Sink for Cassandra in the Spark Cassandra Connector. You will need to implement your own Sink or wait for it to become available.

If you were using Scala or Java you could use foreach operator and use a ForeachWriter as described in Using Foreach.

like image 66
RussS Avatar answered Oct 06 '22 12:10

RussS