Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Glue: Keep partitioned column as value in row after writing

Does anyone know whether it's possible to tell the Glue writer to keep the column you're partitioning on in the actual dataframe?

https://aws.amazon.com/blogs/big-data/work-with-partitioned-data-in-aws-glue/

Here, $outpath is a placeholder for the base output path in S3. The partitionKeys parameter can also be specified in Python in the connection_options dict:

glue_context.write_dynamic_frame.from_options(
    frame = projectedEvents, 
    connection_options = {"path": "$outpath", "partitionKeys": ["type"]}, 
    format = "parquet")

When you execute this write, the type field is removed from the individual records and is encoded in the directory structure.

I would like to keep the type field in the individual record.

like image 579
Mat Avatar asked Nov 30 '25 18:11

Mat


1 Answers

I am not 100% sure if it possible to tell Glue to keep the column, but in the meantime you could use this workaround:

projectedEvents = projectedEvents.withColumn("type_partition",projectedEvents["type"])
        
glue_context.write_dynamic_frame.from_options(
                   frame=projectedEvents,
                   connection_options={"path": "$outpath", "partitionKeys": ["type_partition"]}, 
                   format="parquet"
             )
like image 138
Robert Kossendey Avatar answered Dec 02 '25 10:12

Robert Kossendey