If you follow the AWS Glue Add Job Wizard to create a script to write parquet files to S3 you end up with generated code something like this.
datasink4 = glueContext.write_dynamic_frame.from_options(
    frame=dropnullfields3,
    connection_type="s3",
    connection_options={"path": "s3://my-s3-bucket/datafile.parquet"},
    format="parquet",
    transformation_ctx="datasink4",
)
Is it possible to specify a KMS key so that the data is encrypted in the bucket?
glue scala job
val spark: SparkContext = new SparkContext()
val glueContext: GlueContext = new GlueContext(spark)
spark.hadoopConfiguration.set("fs.s3.enableServerSideEncryption", "true")
spark.hadoopConfiguration.set("fs.s3.serverSideEncryption.kms.keyId", args("ENCRYPTION_KEY"))
I think syntax should be differ for Python, but idea the same
To spell out the answer using PySpark, you can do either
from pyspark.conf import SparkConf
[...]
spark_conf = SparkConf().setAll([
  ("spark.hadoop.fs.s3.enableServerSideEncryption", "true"),
  ("spark.hadoop.fs.s3.serverSideEncryption.kms.keyId", "<Your Key ID>")
])
sc = SparkContext(conf=spark_conf)
noticing the spark.hadoop prefix - or (uglier but shorter)
sc._jsc.hadoopConfiguration().set("fs.s3.enableServerSideEncryption", "true")
sc._jsc.hadoopConfiguration().set("fs.s3.serverSideEncryption.kms.keyId", "<Your Key ID>")
where sc is your current SparkContext.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With