I have scenario where i will be receiving streaming data which is processed by my spark streaming program and the output for each interval is being appended to my existing cassandra table.
Currently my spark streaming program will generate a data frame which i need to save in my cassandra table. The problem i'm currently facing is i'm not able to append data/rows into my existing cassandra table when i use below command
dff.write.format("org.apache.spark.sql.cassandra").options(Map("table" -> "xxx", "yyy" -> "retail")).save()
I had read in following link http://rustyrazorblade.com/2015/08/migrating-from-mysql-to-cassandra-using-spark/ where he passed mode="append" into save method but its throwing syntax error
Also i was nt able to understand where do i need to fix from the below link https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/rlGGWQF2wnM
Need help as how to fix this issue.I'm writing my spark streaming jobs in scala
I think you have to do it the following way:
dff.write.format("org.apache.spark.sql.cassandra").mode(SaveMode.Append).options(Map("table" -> "xxx", "yyy" -> "retail")).save()
The way cassandra handles data forces you to do so-called 'upserts' - you have to remember that an insert may overwrite some of the rows where the primary key of already stored record is the same as a primary key of inserted reccord. Cassandra is a 'write-fast' database, so it does not check for data existence before writing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With