We are using AWS glue to convert JSON files stored in our S3 datalake. Here are the steps that I followed, <ol> <li>Created a crawler for generating table on Glue from our datalake bucket which has JSON data.</li> <li> The newly created tables have partitions as follows, Name, Year, Month, day, hour </li> <li>Created a glue job to convert it to Parquet and store in a different bucket</li> </ol> With these process, the jobs run successfully but the data in the new bucket is not partitioned. Its just comes under a single directory. What I want to achieve is the converted parquet files should get the same partitions as in the source table/data lake bucket. Also, i want to increase the file size of the parquet files(reduce the number of files). Can anyone help me on this?

Try the below for writing the dynamic frame. <pre class="prettyprint"><code>glueContext.write_dynamic_frame.from_options( frame=<output_dataframe>, connection_type="s3", connection_options={"path": "s3://<output_bucket_path>", "partitionKeys": ["Name", "Year", "Month" , "day", "hour"]}, format="parquet") </code></pre>

AWS Glue convert files from JSON to Parquet with same partitions as source table

1 Answers

Try the below for writing the dynamic frame.

glueContext.write_dynamic_frame.from_options(
frame=<output_dataframe>,
connection_type="s3",
connection_options={"path": "s3://<output_bucket_path>",
                    "partitionKeys": ["Name", "Year", "Month" , "day", "hour"]},
format="parquet")

answered Oct 12 '22 01:10

kmn

Related questions
                            
                                AWS AutoScaling: how to disable health checks
                            
                                Querying AWS DynamoDB to return lat and lon result within radius
                            
                                Connect to DynamoDB Local from inside docker container with boto3
                            
                                Blue-green deployment on Web Service with WebSocket Implementation on AWS
                            
                                Getting success response from AWS SNS but SMS not delivered
                            
                                Swift 3: Transfer Utility enumerateToAssignBlocks method signature
                            
                                Error creating Key Pair: You are not authorized to perform this operation
                            
                                Cross-account lambda trigger by kinesis
                            
                                Any way to trigger a CodeDeploy deployment with Terraform after changes in configuration?
                            
                                Why Can't My AWS instance install node-sass pacakage?
                            
                                AWS Cognito can't set credentials on CognitoIdentityCredentials
                            
                                AWSCognito forgot password not working for non verified mobile number iOS
                            
                                S3 Lambda trigger double invocation after exactly 10 minutes
                            
                                Load data from AWS DynamoDB table to a UITableView on iOS
                            
                                AWS Lambda for CodeCommit repo sync
                            
                                CloudFront responds with 403 Forbidden instead of triggering Lambda
                            
                                cloudformation error: Received 1 FAILURE signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement
                            
                                Dynamodb batch put if item does not exist
                            
                                how to generate Access key & secret key for AWS roles
                            
                                Simplest lambda function to copy a file from one s3 bucket to another

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

AWS Glue convert files from JSON to Parquet with same partitions as source table

Tags:

amazon-web-services

bigdata

aws-glue

Vishnu Prassad

People also ask

1 Answers

kmn

Recent Activity

Donate For Us