Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aws glue `ImportError: cannot import name 'S3ArnParamHandler'`

I developed a pandas etl script locally and works fine.

I prepared a wheel file and uploaded to s3. All packages are installed properly.

However, when the script run, it shows ImportError: cannot import name 'S3ArnParamHandler'

Below is my requirements.txt

awscli==1.18.140
asn1crypto==1.4.0
awswrangler==1.9.3
azure-common==1.1.25
azure-core==1.8.1
azure-storage-blob==12.5.0; python_version >= '3.5.2'
boto3==1.14.63
botocore==1.17.63
certifi==2020.6.20
cffi==1.14.2
chardet==3.0.4
cryptography==2.9.2; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
docutils==0.15.2; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2'
fsspec==0.8.2
idna==2.9; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
isodate==0.6.0
jmespath==0.10.0; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2'
msrest==0.6.19
numpy==1.19.2
oauthlib==3.1.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
oscrypto==1.2.1
packaging==20.4; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pandas==1.0.0
psycopg2-binary==2.8.6; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pyarrow==1.0.1; python_version >= '3.5'
pycparser==2.20; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pycryptodomex==3.9.8; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2, 3.3'
pyjwt==1.7.1
pymysql==0.9.0
pyopenssl==19.1.0
pyparsing==2.4.7; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2'
python-dateutil==2.8.1
pytz==2020.1
requests-oauthlib==1.3.0
requests==2.23.0
s3fs==0.4.2
s3transfer==0.3.3
six==1.15.0
snowflake-connector-python==2.3.2; python_version >= '3.5'
snowflake-sqlalchemy==1.2.3
sqlalchemy-redshift==0.8.1
sqlalchemy==1.3.13
urllib3==1.25.10
xlrd==1.2.0
like image 955
JOHN Avatar asked Sep 18 '20 08:09

JOHN


People also ask

Could not find S3 endpoint or NAT gateway for subnetId AWS Glue?

Error: Could not find S3 endpoint or NAT gateway for subnetId in VPC. Check the subnet ID and VPC ID in the message to help you diagnose the issue. Check that you have an Amazon S3 VPC endpoint set up, which is required with AWS Glue. In addition, check your NAT gateway if that's part of your configuration.

How does AWS Glue handle ETL errors?

Q: How does AWS Glue handle ETL errors? AWS Glue monitors job event metrics and errors, and pushes all notifications to Amazon CloudWatch. With Amazon CloudWatch, you can configure a host of actions that can be triggered based on specific notifications from AWS Glue.

Does AWS Glue support pandas?

According to AWS Glue Documentation: "Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported."


2 Answers

It seems that right now it is not possible to overwrite botocore and boto3 libraries versions on AWS Glue Python shell (https://github.com/boto/boto3/issues/2566).

Provided versions are:

  • botocore 1.12.232
  • boto3 1.9.203

aiobotocore is looking for some import that is not available in botocore 1.12.232.

I know that's not perfect solution, but in my case I had to remove/downgrade all dependencies that use features which are not available in those boto libraries to make Glue job work correctly.

like image 123
Tomasz Avatar answered Oct 21 '22 13:10

Tomasz


Tomasz is correct - it has to do with the (lower) versions of boto3 / botocore that are currently (April 2021) in use with Glue.

If you're trying to create a pandas dataframe from a file stored in S3 using read_csv / read_excel, you will get this error.

You can get around it by first downloading it to a local directory and then passing that local file name to the pandas read_ function. Not pretty, I know.

like image 2
Rich G Avatar answered Oct 21 '22 12:10

Rich G