There are a lot of methods in API which received this with default "" value.
Is it just string marker but again what it purpose?
A DynamicFrame is similar to a DataFrame , except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.
abstract class DataSink. The writer analog to a DataSource . DataSink encapsulates a destination and a format that a DynamicFrame can be written to.
GlueContext is the entry point for reading and writing a DynamicFrame from and to Amazon Simple Storage Service (Amazon S3), the AWS Glue Data Catalog, JDBC, and so on. This class provides utility functions to create DataSource trait and DataSink objects that can in turn be used to read and write DynamicFrame s.
On the Node properties tab, enter a name for the node in the job diagram. In the Node properties tab, under the heading Node parents, add a parent node so that there are two datasets providing inputs for the join. The parent can be a data source node or a transform node. A join can have only two parent nodes.
Many of the AWS Glue PySpark dynamic frame methods include an optional parameter named transformation_ctx, which is used to identify state information for a job bookmark. If you do not pass in the transformation_ctx parameter, then job bookmarks are not enabled for a dynamic frame or table used in the method.
https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
I think this is what is going on. I wish the AWS docs would explicitly state it.
Bookmarks alone would only let you pick up at the next piece of data (e.g. next file in S3). But for a complex job with Dynamic Frames, the job itself it stateful. To resume processing, you need to not only pick up with the next piece of input, but also restore the state you had built up within your Dynamic Frames during the last run. The transformation_ctx is like a filename for saving the Dynamic Frame state. You have to name it, because AWS Glue isn't going to analyze your script to figure out which dynamic frame invocation is which.
Inferred primarily from Tracking Processed Data Using Job Bookmarks, which is the same page that other answers linked, but has somewhat clarified text since they quoted it:
Many of the AWS Glue PySpark dynamic frame methods include an optional parameter named transformation_ctx, which is a unique identifier for the ETL operator instance. The transformation_ctx parameter is used to identify state information within a job bookmark for the given operator. Specifically, AWS Glue uses transformation_ctx to index the key to the bookmark state.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With