What is the difference? I know that DynamicFrame was created for AWS Glue, but AWS Glue also supports DataFrame. When should DynamicFrame be used in AWS Glue?
A DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially. You want to use DynamicFrame when, Data that does not conform to a fixed schema.
A DynamicFrame is a distributed collection of self-describing DynamicRecord objects. DynamicFrame s are designed to provide a flexible data model for ETL (extract, transform, and load) operations.
A DynamicFrame is similar to a DataFrame , except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.
AWS Glue supports an extension of the PySpark Python dialect for scripting extract, transform, and load (ETL) jobs.
DynamicFrame is safer when handling memory intensive jobs. "The executor memory with AWS Glue dynamic frames never exceeds the safe threshold," while on the other hand, Spark DataFrame could hit "Out of memory" issue on executors. (https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-debug-oom-abnormalities.html)
DynamicFrames are designed to provide maximum flexibility when dealing with messy data that may lack a declared schema. Records are represented in a flexible self-describing way that preserves information about schema inconsistencies in the data.
For example, with changing requirements, an address column stored as a string in some records might be stored as a struct in later rows. Rather than failing or falling back to a string, DynamicFrames will track both types and gives users a number of options in how to resolve these inconsistencies, providing fine grain resolution options via the ResolveChoice transforms.
DynamicFrames also provide a number of powerful high-level ETL operations that are not found in DataFrames. For example, the Relationalize transform can be used to flatten and pivot complex nested data into tables suitable for transfer to a relational database. In additon, the ApplyMapping transform supports complex renames and casting in a declarative fashion.
DynamicFrames are also integrated with the AWS Glue Data Catalog, so creating frames from tables is a simple operation. Writing to databases can be done through connections without specifying the password. Moreover, DynamicFrames are integrated with job bookmarks, so running these scripts in the job system can allow the script to implictly keep track of what was read and written.(https://github.com/aws-samples/aws-glue-samples/blob/master/FAQ_and_How_to.md)
You can refer to the documentation here: DynamicFrame Class. It says,
A DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially.
You want to use DynamicFrame
when,
Note: You can also convert the DynamicFrame
to DataFrame
using toDF()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With