I need to work on some data that is stored in an Amazon Redshift server. Since I'm most familiar with Django, I'm thinking of using that at the Application layer.
But I can't figure out how to connect to Redshift and retreive data. I only have experience with connecting to local MySQL servers and working.
So this seems to be my only option:
https://github.com/binarydud/django-redshift : A Redshift database backend for django. But I'm not sure about it's stability and I'd hate to run into serious problems later so I'm a bit skeptical about this.
Another option might be (correct me if I'm wrong), that I use this: https://github.com/lionheart/django-pyodbc/
Since Redshift supports ODBC connections, this should work right? Will the Django ORM work normally? Any issues you can foresee?
I'd love for some insight into my situation by someone more experienced with Django. Thanks a lot!
Redshift's interface is built to understand PostgreSQL 8.0.2: http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html
So for reading data from Redshift Django's PostgreSQL, a current project I'm working on just uses the Django 'psycopg2' backend. This is from the settings:
DATABASES = {
'default': {
'NAME': '[cluster name]',
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'USER': '[your user]',
'PASSWORD': '[your pw]',
'HOST': '[path to your cluster].redshift.amazonaws.com',
'PORT': 5439,
},
}
More info about connecting is here - http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-in-code.html - but it's only for Java and .NET at the moment.
The ORM should work normally for most queries, although they won't be optimised for Redshift - so keep an eye on how many queries you're emitting, because slow queries sent unnecessarily lots of times in a single view isn't good.
I'm playing around with https://github.com/jazzband/django-redshift-backend, anyone else?
My plan was to use the ORM for meta-data type of tables(systems, manufactures etc) which, have small amounts of data and dist all, then use raw sql for any aggregations functions on the timeseries tables and/or use a caching tables for more intensive activities.
So far, I've created a couple of models, using ForeignKey to associate them, it's working. Next, I was going to try to leverage admin.
We might need to switch to a two database architecture: one for meta-data, one for timeseries data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With