I developed a pandas etl script locally and works fine.
I prepared a wheel file and uploaded to s3. All packages are installed properly.
However, when the script run, it shows ImportError: cannot import name 'S3ArnParamHandler'
Below is my requirements.txt
awscli==1.18.140
asn1crypto==1.4.0
awswrangler==1.9.3
azure-common==1.1.25
azure-core==1.8.1
azure-storage-blob==12.5.0; python_version >= '3.5.2'
boto3==1.14.63
botocore==1.17.63
certifi==2020.6.20
cffi==1.14.2
chardet==3.0.4
cryptography==2.9.2; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
docutils==0.15.2; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2'
fsspec==0.8.2
idna==2.9; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
isodate==0.6.0
jmespath==0.10.0; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2'
msrest==0.6.19
numpy==1.19.2
oauthlib==3.1.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
oscrypto==1.2.1
packaging==20.4; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pandas==1.0.0
psycopg2-binary==2.8.6; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pyarrow==1.0.1; python_version >= '3.5'
pycparser==2.20; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pycryptodomex==3.9.8; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2, 3.3'
pyjwt==1.7.1
pymysql==0.9.0
pyopenssl==19.1.0
pyparsing==2.4.7; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2'
python-dateutil==2.8.1
pytz==2020.1
requests-oauthlib==1.3.0
requests==2.23.0
s3fs==0.4.2
s3transfer==0.3.3
six==1.15.0
snowflake-connector-python==2.3.2; python_version >= '3.5'
snowflake-sqlalchemy==1.2.3
sqlalchemy-redshift==0.8.1
sqlalchemy==1.3.13
urllib3==1.25.10
xlrd==1.2.0
Error: Could not find S3 endpoint or NAT gateway for subnetId in VPC. Check the subnet ID and VPC ID in the message to help you diagnose the issue. Check that you have an Amazon S3 VPC endpoint set up, which is required with AWS Glue. In addition, check your NAT gateway if that's part of your configuration.
Q: How does AWS Glue handle ETL errors? AWS Glue monitors job event metrics and errors, and pushes all notifications to Amazon CloudWatch. With Amazon CloudWatch, you can configure a host of actions that can be triggered based on specific notifications from AWS Glue.
According to AWS Glue Documentation: "Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported."
It seems that right now it is not possible to overwrite botocore and boto3 libraries versions on AWS Glue Python shell (https://github.com/boto/boto3/issues/2566).
Provided versions are:
aiobotocore is looking for some import that is not available in botocore 1.12.232.
I know that's not perfect solution, but in my case I had to remove/downgrade all dependencies that use features which are not available in those boto libraries to make Glue job work correctly.
Tomasz is correct - it has to do with the (lower) versions of boto3 / botocore that are currently (April 2021) in use with Glue.
If you're trying to create a pandas dataframe from a file stored in S3 using read_csv / read_excel, you will get this error.
You can get around it by first downloading it to a local directory and then passing that local file name to the pandas read_ function. Not pretty, I know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With