Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

formatting AWS glue output to JSON OBJECT

This is the result I get from my pyspark job in AWS GLUE

{a:1,b:7}
{a:1,b:9}
{a:1,b:3}

but I need to write this data on s3 and send it to an API in JSON array format

[
 {a:1,b:2}, 
 {a:1,b:7}, 
 {a:1,b:9}, 
 {a:1,b:3}
]

I tried converting my output to DataFrame and then applied toJSON() results = mapped_dyF.toDF() jsonResults = results.toJSON().collect()

but now unable to write back the result on s3 with 'write_dynamic_frame.from_options' as it requires a DF but my'jsonResults' is no longer a DataFrame now.

like image 654
user2354660 Avatar asked Jan 24 '26 20:01

user2354660


1 Answers

In order to put it in JSON array format I usually do the following: df --> DataFrame containing the original data.

if df.count() > 0:
    # Build the json file
    data = list()
    for row in df.collect():
        data.append({"a": row['a'],
                     "b" : row['b']
                    })

I haven't use the Glue write_dynamic_frame.from_options in this case but I use boto3 to save the file:

import boto3
import json

s3 = boto3.resource('s3')
# Dump the json file to s3 bucket  
filename = '/{0}_batch_{1}.json'.format(str(uuid.uuid4()))
obj = s3.Object(bucket_name, filename)
obj.put(Body=json.dumps(data))
like image 175
Aida Martinez Avatar answered Jan 28 '26 05:01

Aida Martinez



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!