formatting AWS glue output to JSON OBJECT

Question

This is the result I get from my pyspark job in AWS GLUE

{a:1,b:7}
{a:1,b:9}
{a:1,b:3}

but I need to write this data on s3 and send it to an API in JSON array format

[
 {a:1,b:2}, 
 {a:1,b:7}, 
 {a:1,b:9}, 
 {a:1,b:3}
]

I tried converting my output to DataFrame and then applied toJSON() results = mapped_dyF.toDF() jsonResults = results.toJSON().collect()

but now unable to write back the result on s3 with 'write_dynamic_frame.from_options' as it requires a DF but my'jsonResults' is no longer a DataFrame now.

Aida Martinez · Accepted Answer

In order to put it in JSON array format I usually do the following: df --> DataFrame containing the original data.

if df.count() > 0:
    # Build the json file
    data = list()
    for row in df.collect():
        data.append({"a": row['a'],
                     "b" : row['b']
                    })

I haven't use the Glue write_dynamic_frame.from_options in this case but I use boto3 to save the file:

import boto3
import json

s3 = boto3.resource('s3')
# Dump the json file to s3 bucket  
filename = '/{0}_batch_{1}.json'.format(str(uuid.uuid4()))
obj = s3.Object(bucket_name, filename)
obj.put(Body=json.dumps(data))

formatting AWS glue output to JSON OBJECT

Tags:

pyspark

aws-glue

user2354660

1 Answers

Aida Martinez

Recent Activity

Donate For Us

formatting AWS glue output to JSON OBJECT

Tags:

pyspark

aws-glue

user2354660

1 Answers

Aida Martinez

Related questions

Recent Activity

Donate For Us