So I have a Python Stream-sourced DataFrame df
that has all the data I want to place into a Cassandra table with the spark-cassandra-connector. I've tried doing this in two ways:
df.write \
.format("org.apache.spark.sql.cassandra") \
.mode('append') \
.options(table="myTable",keyspace="myKeySpace") \
.save()
query = df.writeStream \
.format("org.apache.spark.sql.cassandra") \
.outputMode('append') \
.options(table="myTable",keyspace="myKeySpace") \
.start()
query.awaitTermination()
However I keep on getting this errors, respectively:
pyspark.sql.utils.AnalysisException: "'write' can not be called on streaming Dataset/DataFrame;
and
java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.cassandra does not support streamed writing.
Is there anyway I can send my Streamed DataFrame into a my Cassandra Table?
There is currently no streaming Sink
for Cassandra in the Spark Cassandra Connector. You will need to implement your own Sink
or wait for it to become available.
If you were using Scala or Java you could use foreach
operator and use a ForeachWriter
as described in Using Foreach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With