Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Glue job hangs when calling the AWS Glue client API using boto3 from the context of a running AWS Glue Job?

I'm trying to create a Glue Job that enumerates all tables in a database in my catalog. In order to do so I use the following code snippet:

session = boto3.Session(region_name='us-east-2')
glue = session.client('glue')
tables = glue.get_tables(
    DatabaseName='customer1'
)
print tables

The job hangs for about 15 minutes and the connection appears to be refused, because I eventually get the following error:

botocore.vendored.requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='glue.us-east-2.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to glue.us-east-2.amazonaws.com timed out. (connect timeout=60)’))

This issue is specific to the glue API. I can use the S3 API with no problems.

I've gone through all my security groups and opened up all the ports to traffic from anywhere. I've even added self-referencing rules. But this is to no avail.

I can't figure out what could be causing the connection to be blocked. Is AWS specifically blocking glue requests?

like image 276
Simon Ejsing Avatar asked Jun 13 '18 22:06

Simon Ejsing


People also ask

Why is my AWS Glue ETL job running for a long time?

Some common reasons why your AWS Glue jobs take a long time to complete are the following: Large datasets. Non-uniform distribution of data in the datasets. Uneven distribution of tasks across the executors.

Can I use Boto3 in AWS Glue?

To create an AWS Glue job, you need to use the create_job() method of the Boto3 client. This method accepts several parameters such as the Name of the job, the Role to be assumed during the job execution, set of commands to run, arguments for those commands, and other parameters related to the job execution.

Can AWS Glue call an API?

Yes, it is possible. You can use Amazon Glue to extract data from REST APIs. Although there is no direct connector available for Glue to connect to the internet world, you can set up a VPC, with a public and a private subnet.


1 Answers

I was facing the same problem that boto3 calls to glue or s3 were hanging and eventually timing out.

I fixed it by changing the subnet-id when creating the dev-endpoint. Initially I was using a subnet that routed traffic to an Internet Gateway. I switched to a subnet routing traffic to an internal NAT gateway. Hope this helps.

like image 169
botchniaque Avatar answered Oct 19 '22 22:10

botchniaque