Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write streaming data to S3?

I want to write RDD[String] to Amazon S3 in Spark Streaming using Scala. These are basically JSON strings. Not sure how to do it more efficiently. I found this post, in which the library spark-s3 is used. The idea is to create SparkContext and then SQLContext. After this the author of the post does something like this:

myDstream.foreachRDD { rdd =>
      rdd.toDF().write
                .format("com.knoldus.spark.s3")
                .option("accessKey","s3_access_key")
                .option("secretKey","s3_secret_key")
                .option("bucket","bucket_name")
                .option("fileType","json")
                .save("sample.json")
}

What are another options besides spark-s3? Is it possible to append the file on S3 with the streaming data?

like image 719
Lobsterrrr Avatar asked Oct 28 '25 15:10

Lobsterrrr


1 Answers

Files on S3 cannot be appended. An "append" means in S3 to replace the existing object with a new object that contains the additional data.

like image 118
jzonthemtn Avatar answered Oct 31 '25 06:10

jzonthemtn