What is the easiest way to use packages such as NumPy and Pandas within the new ETL tool on AWS called Glue? I have a completed script within Python I would like to run in AWS Glue that utilizes NumPy and Pandas.
According to AWS Glue Documentation: "Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported."
AWS Glue supports scripts that are compatible with Python 3.6 in Python shell jobs. Starting today, you can now run scripts using Python shell jobs that are compatible with Python 3.6. Previously, Python shell jobs in AWS Glue were compatible only with Python 2.7.
AWS Glue now supports the Scala programming language, in addition to Python, to give you choice and flexibility when writing your AWS Glue ETL scripts. You can run these scripts interactively using Glue's development endpoints or create jobs that can be scheduled.
You can check latest python packages installed using this script as glue job
import logging
import pip
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
if __name__ == '__main__':
logger.info(pip._internal.main(['list']))
As of 30-Jun-2020
Glue as has these python packages pre-installed. So numpy
and pandas
is covered.
awscli 1.16.242
boto3 1.9.203
botocore 1.12.232
certifi 2020.4.5.1
chardet 3.0.4
colorama 0.3.9
docutils 0.15.2
idna 2.8
jmespath 0.9.4
numpy 1.16.2
pandas 0.24.2
pip 20.0.2
pyasn1 0.4.8
PyGreSQL 5.0.6
python-dateutil 2.8.1
pytz 2019.3
PyYAML 5.2
requests 2.22.0
rsa 3.4.2
s3transfer 0.2.1
scikit-learn 0.20.3
scipy 1.2.1
setuptools 45.1.0
six 1.14.0
urllib3 1.25.8
virtualenv 16.7.9
wheel 0.34.2
You can install additional packages in glue-python if they are present in the requirements.txt
used to build the attaching .whl
. The whl
file gets collected and installed before your script is kicked-off. I would also suggest you to look into Sagemaker Processing which is more easier for python based jobs. Unlike serveless instance for glue-python shell, you are not limited to 16gb limit there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With