AWS Glue: ETL to read S3 CSV files

Question

I want to use ETL to read data from S3. Since with ETL jobs I can set DPU to hopefully speed things up.

But how do I do it? I tried

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

inputGDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://pinfare-glue/testing-csv"]}, format = "csv")
outputGDF = glueContext.write_dynamic_frame.from_options(frame = inputGDF, connection_type = "s3", connection_options = {"path": "s3://pinfare-glue/testing-output"}, format = "parquet")

But it appears there is nothing written. My folder looks like:

enter image description here

Whats incorrect? My output S3 only has a file like: testing_output_$folder$

Saiful Rizal MDRamli · Accepted Answer

I believe the issue here is that you have subfolders within testing-csv folder and since you did not specify recurse to be true, Glue is not able to find the files in the 2018-09-26 subfolder (or in fact any other subfolders).

You need to add the recurse option as follows

inputGDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://pinfare-glue/testing-csv"], "recurse"=True}, format = "csv")

Also, regarding your question about crawlers in the comments, they help to infer the schema of your data files. So, in your case here does nothing since you are creating the dynamicFrame directly from s3.

AWS Glue: ETL to read S3 CSV files

Tags:

amazon-web-services

amazon-s3

etl

aws-glue

Jiew Meng

1 Answers

Saiful Rizal MDRamli

Recent Activity

Donate For Us

AWS Glue: ETL to read S3 CSV files

Tags:

amazon-web-services

amazon-s3

etl

aws-glue

Jiew Meng

1 Answers

Saiful Rizal MDRamli

Related questions

Recent Activity

Donate For Us