I was trying AWS Glue to migrate our current data pipeline from python scripts to AWS Glue . I was able to setup a crawler to pull the schema for the different postgres databases . However, I am facing issues in pulling data from Postgres RDS to S3 tables in Athena .
Thanks in advance !
You can't pull data from AWS RDS to S3 using Athena. Athena is a query engine over S3 data. To be able to extract data from RDS to S3, you can run a Glue job to read from a particular RDS table and create S3 dump in parquet format which will create another external table pointing to S3 data. Then you can query that S3 data using Athena. A sample code snippet to read from RDS using Glue catalog and write parquet in S3 will look like below. There are some Glue predefined template which you can use to experiment. Start with a small table first. Please let me know if it worked out for you or any further questions/issues.
datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="postgresql", connection_options =
{"url": "jdbc-url/database",
"user": "user_name",
"password": "password",
"dbtable": "table_name"},
transformation_ctx = "datasource0")
datasink4 = glueContext.write_dynamic_frame.from_options(frame = datasource0, connection_type = "s3", connection_options = {"path": "s3://aws-glue-tpcds-parquet/"+ tableName + "/"}, format = "parquet", transformation_ctx = "datasink4")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With