Overwrite parquet files from dynamic frame in AWS Glue

Tags:

I use dynamic frames to write a parquet file in S3 but if a file already exists my program append a new file instead of replace it. The sentence that I use is this:

glueContext.write_dynamic_frame.from_options(frame = table,
                                         connection_type = "s3",
                                         connection_options = {"path": output_dir,
                                                               "partitionKeys": ["var1","var2"]},
                                         format = "parquet")

Is there anything like "mode":"overwrite" that replace my parquet files?

796

asked Aug 24 '18 09:08

Mateo Rod

1 Answers

Currently AWS Glue doesn't support 'overwrite' mode but they are working on this feature.

As a workaround you can convert DynamicFrame object to spark's DataFrame and write it using spark instead of Glue:

table.toDF()
  .write
  .mode("overwrite")
  .format("parquet")
  .partitionBy("var_1", "var_2")
  .save(output_dir)

136

answered Sep 18 '22 11:09

Yuriy Bondaruk

Related questions
                            
                                AWS SAM - Template does not have any APIs connected to Lambda functions
                            
                                Is it possible to subscribe to a WebSocket in a serverless fashion using AWS?
                            
                                How to debug a aws lambda function?
                            
                                How do I pass a list of strings as a parameter in CloudFormation?
                            
                                Is there a way to generate the AWS Console URLs for CloudWatch Log Group filters?
                            
                                How can I prevent EC2 instance termination by Auto Scaling?
                            
                                Uploading PDF to Amazon S3 and display in-browser
                            
                                How to provide multiple StringNotEquals conditions in AWS policy?
                            
                                Do AWS support SES in CloudFormation?
                            
                                How to keep desired amount of AWS Lambda function containers warm
                            
                                Heroku: Couldn't find Active Storage configuration in /app/config/storage.yml (RuntimeError)
                            
                                MalformedPolicyDocument error when creating policy via terraform
                            
                                How to use Terraform in a cloud agnostic way
                            
                                DynamoDb delete non-existent item does not fail, why?
                            
                                Can you run a local copy of AWS DynamoDB somehow?
                            
                                Retrieving public dns of EC2 instance with BOTO3
                            
                                Multiple conditions in cloud formation resource creation
                            
                                django storages aws s3 delete file from model record
                            
                                Where can I see tables for RDS instances in AWS console?
                            
                                How to speed up deployments on AWS Fargate?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Overwrite parquet files from dynamic frame in AWS Glue

Tags:

amazon-web-services

parquet

aws-glue

Mateo Rod

People also ask

1 Answers

Yuriy Bondaruk

Recent Activity

Donate For Us