I have just been playing around with Glue but have yet to get it to successfully create a new table in an existing S3 bucket. The job will execute without error but there is never any output in S3.
Here's what the auto generated code is:
glueContext.write_dynamic_frame.from_options(frame = applymapping1,
connection_type = "s3", connection_options = {"path":
"s3://glueoutput/output/"}, format = "json", transformation_ctx =
"datasink2")
Have tried all variations of this - with name of file (that doesn't exist yet), in root folder of bucket, trailing slash and without. The role being used has full access to S3. Tried creating buckets in different regions. No file is ever created though. Again console says its successful.
It can read and write to the S3 bucket. Type: Spark. Glue version: Spark 2.4, Python 3. This job runs: A new script to be authored by you.
If your AWS Glue jobs are not pushing logs to CloudWatch, then check the following: Be sure that your AWS Glue job has all the required AWS Identity and Access Management (IAM) permissions. Be sure that the AWS Key Management Service (AWS KMS) key allows the CloudWatch Logs service to use the key.
Choose the Data source properties tab, and then enter the following information: S3 source type: (For Amazon S3 data sources only) Choose the option S3 location. S3 URL: Enter the path to the Amazon S3 bucket, folder, or file that contains the data for your job.
Some common reasons why your AWS Glue jobs take a long time to complete are the following: Large datasets. Non-uniform distribution of data in the datasets. Uneven distribution of tasks across the executors.
As @Drellgor suggests in his comment to the previous answer, make sure you disabled "Job Bookmarks" unless you definitely don't want to process old files.
From the documentation:
"AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data."
your code is correct, just verify if there is any data at all in applymapping1 DF? you check with this command : applymapping1.toDF().show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With