Why do I need to set the `transformation_ctx` parameter when calling transformation and sink operations for AWS Glue bookmark to work?

Tags:

The AWS Glue Bookmark document (https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html) seems to suggest one has to pass a transformation_ctx parameter to source, transform and sink operation for the bookmark to work. This is reflected in the sample code in that page, where invocation of all of create_dynamic_frame.from_catalog(), ApplyMapping.apply() and write_dynamic_frame.from_options() are passed with a transformation_ctx value.

I can understand the point to pass such a transformation_ctx to create_dynamic_frame.from_catalog() method, as AWS Glue needs to store the information about files which have been read in the bookmark under the given transformation_ctx key.

However, I don't understand why this is also necessary for methods like ApplyMapping.apply() and write_dynamic_frame.from_options(). To put it another way, what is the state information these operations need to store in the bookmark? If I don't pass transformation_ctx to these methods, what problems will this cause?

799

asked Jun 24 '20 05:06

victorx

1 Answers

I had the same doubts about the bookmarking months ago (October 2019) and since the documentation provided by Amazon is not very clear I opened a support case to understand more how it is implemented.

In my Glue Job there was:

A read function from S3 (glue_context.create_dynamic_frame.from_options)
A ResolveChoice.apply
A write function to Redshift (glue_context.write_dynamic_frame.from_jdbc_conf)

All of these operations has the transformation_ctx value, I tested different possible behaviours (same transformation_ctx for all, different, fixed values, dynamic values ecc).

After many message with the AWS support they confirm that the bookmarking works only on the read function (They also said with only S3 as a source but I didn't test it), so I ask if the transformation_ctx is useless in the ResolveChoice (and write function too) and they said YES! They confirmed that doesn't make any difference.

Futhermore for the write function it doesn't change anything, so there is no bookmark logic, no "avoid function" if it has been already run before.

108

answered Sep 30 '22 13:09

Hyruma92

Related questions
                            
                                Amazon S3 bucket policy allow access to ONLY specific http
                            
                                Unable to upload a file from sagemaker notebook to S3
                            
                                Does Ubuntu UFW overrides Amazon Ec2's security groups and rules?
                            
                                Pass arguments to Python running in Docker container on AWS Fargate
                            
                                How do I get the ARN of an AWS Lambda function for a Cloud Formation specific resource property?
                            
                                VPC Peering via aws-cdk
                            
                                Running image with aws ecs throws 504 Gateway Time-out
                            
                                How to run ECS tasks using terraform?
                            
                                S3: Make uploaded files public by default
                            
                                eks iam roles for services account not working
                            
                                sam package vs aws cloudformation package
                            
                                How configure health check for containers deployed to AWS ECS
                            
                                AWS Cognito - Create user without sending them a verification email
                            
                                Best way to separate live and test environemnts in aws
                            
                                Can I force CloudFormation to resolve values from Secrets Manager?
                            
                                Terraform error "Your query returned no results"
                            
                                Get value from AWS Systems Manager Parameter Store during Elastic Beanstalk deploy
                            
                                Does viewing an SQS message in console increase receive count?
                            
                                AWS CloudWatch logs: How to send an email notification when particular error message is logged by lambda
                            
                                ECS Fargate 1.4 Not Using VPC Endpoints

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do I need to set the `transformation_ctx` parameter when calling transformation and sink operations for AWS Glue bookmark to work?

Tags:

amazon-web-services

aws-glue

victorx

People also ask

1 Answers

Hyruma92

Recent Activity

Donate For Us