Has anyone tried connecting superset to AWS athena ?
I was able to connect to redshift by using SQLAlchemy URI: postgresql://username:[email protected]:port/dbname
but I am having hard time connecting to AWS athena. AWS has JDBC driver (http://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html) but I can't figure out how to use it with superset. Any example ?
You can access Athena using the AWS Management Console, a JDBC or ODBC connection, the Athena API, the Athena CLI, the AWS SDK, or AWS Tools for Windows PowerShell.
Use the Amazon Athena console to query the data in your data lake. Open the Athena console at https://console.aws.amazon.com/athena/ , and sign in as the data analyst, user datalake_user . If necessary, choose Get Started to continue to the Athena query editor. For Data source, choose AwsDataCatalog.
In case someone else would come here:
awsathena+jdbc://username:[email protected]:port/dbname
This is from the superset documentation.
We tried installing superset with PyAthena JDBC & REST. Our experience with PyAthena (REST) is far better than PyAthenaJDBC, would recommend to use same in production.
Install PyAthena (pure python library, java is not needed)
pip install "PyAthena>1.2.0"
Access database by creating connection url
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
I found this article, a good guide on deploying superset.
Take a look at this github PR You'll want to install PyAthenaJDBC package into pip. The driver that you are referring to is a Java driver, which is great, but Superset is largely a Python application, so it'll need a python driver to connect/interact with Athena.
The above answer is correct, but you'll want to install that package to ensure that you actually can connect to athena.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With