Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I build a front end for querying a Redshift database (hopefully with Rails)

So I have a Redshift database with enough tables that it feels worth my time to build a front end to make querying it a little bit easier than just typing in SQL commands.

Ideally, I'd be able to do this by connecting the database to a Rails app (because I have a bit of experience with Rails). I'm not sure how I'd connect a remote Redshift database to a local Rails application though, or how to make activerecord work with redshift.

Does anyone have any suggestions/resources to help me get started? I'm open to other options to connect the Redshift database to a front end if there are pre-made options easier than Rails.

like image 746
johncorser Avatar asked Jul 15 '14 14:07

johncorser


People also ask

How do you query data in Redshift?

To query databases hosted by your Amazon Redshift cluster, you have two options: Connect to your cluster and run queries on the AWS Management Console with the query editor. If you use the query editor on the Amazon Redshift console, you don't have to download and set up a SQL client application.

What query language does Amazon Redshift use?

The SQL language consists of commands that you use to create and manipulate database objects, run queries, load tables, and modify the data in tables. Amazon Redshift is based on PostgreSQL.

How do I run a query against Redshift?

To use the query editor on the Amazon Redshift console On the navigation menu, choose Query editor, then connect to a database in your cluster. For Schema, choose public to create a new table based on that schema. Enter the following in the query editor window and choose Run to create a new table.

Which node is used to run SQL queries in Amazon Redshift?

The Query Editor provides an in-browser interface for running SQL queries on Amazon Redshift clusters. For queries that are run on compute nodes, you can then view the query results and query execution plan next to your queries.


2 Answers

#app/models/data_warehouse.rb
class DataWarehouse < ActiveRecord::Base                      
  establish_connection "redshift_staging"
  #or, if you want to have a db per environment
  #establish_connection "redshift_#{Rails.env}"
end

Note that we are connecting on 5439, not the default 5432 so I specify the port Also, I specify a schema, beta, which is what we use for our unstable aggregates, you could either have a different db per environment as mentioned above, or use various schemas and include them in the search path for ActiveRecord

#config/database.yml
redshift_staging:                                                          
  adapter: postgresql                                                      
  encoding: utf8                                                           
  database: db03                                                         
  port: 5439                                                               
  pool: 5                                                                  
  schema_search_path: 'beta'                                                                                          
  username: admin                                                        
  password: supersecretpassword                                               
  host: db03.myremotehost.us  #your remote host here, might be an aws url from Redshift admin console 

###OPTION 2, a direct PG Connection

  class DataWarehouse < ActiveRecord::Base                      

    attr_accessor :conn                                                       

    def initialize                                                            
      @conn = PG.connect(                                                     
       database: 'db03',                                                   
       port: 5439,                                                           
       pool: 5,                                                              
       schema_search_path: 'beta',                                           
       username: 'admin',                                                  
       password: 'supersecretpassword',                                         
       host: 'db03.myremotehost.us'                                               
      )                                                                       
    end    
  end


[DEV] main:0> redshift = DataWarehouse
E, [2014-07-17T11:09:17.758957 #44535] ERROR -- : PG::InsufficientPrivilege: ERROR:  permission denied to set parameter "client_min_messages" to "notice" : SET client_min_messages TO 'notice'
(pry) output error: #<ActiveRecord::StatementInvalid: PG::InsufficientPrivilege: ERROR:  permission denied to set parameter "client_min_messages" to "notice" : SET client_min_messages TO 'notice'>   

UPDATE:

I ended up going with option 1, but using this adapter for now for multiple reasons:

https://github.com/fiksu/activerecord-redshift-adapter

Reason 1: ActiveRecord postgresql adapter sets client_min_messages Reason 2: adapter also attempts to set Time Zone, which redshift doesn't allow (http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html) Reason 3: Even if you change the code in ActiveRecord for the first two errors, you run into additional errors that complain that Redshift is using Postgresql 8.0, at that point I moved on to the adapter, will revisit and update if I find something better later.

I renamed my table to base_aggregate_redshift_tests (notice plural) so ActiveRecord was easily able to connect, if you can't change your table names in redshift use the set_table method I have commented out below

#Gemfile:
gem 'activerecord4-redshift-adapter', github: 'aamine/activerecord4-redshift-adapter'

Option 1

#config/database.yml
redshift_staging:                                                                                                             
  adapter: redshift                                                                                                           
  encoding: utf8                                                                                                              
  database: db03                                                                                                           
  port: 5439                                                                                                                  
  pool: 5                                                                                                                     
  username: admin                                                                                                
  password: supersecretpassword                                                                                                  
  host: db03.myremotehost.us                                                                                                       
  timeout: 5000   

#app/models/base_aggregates_redshift_test.rb
#Model named to match my tables in Redshift, if you want you can set_table like I have commented out below

class BaseAggregatesRedshiftTest < ActiveRecord::Base
  establish_connection "redshift_staging"
  self.table_name = "beta.base_aggregates_v2"
end

in console using self.table_name -- notice it queries the right table, so you can name your models whatever you want

[DEV] main:0> redshift = BaseAggregatesRedshiftTest.first                                                                    
D, [2014-07-17T15:31:58.678103 #43776] DEBUG -- :   BaseAggregatesRedshiftTest Load (45.6ms)  SELECT "beta"."base_aggregates_v2".* FROM "beta"."base_aggregates_v2" LIMIT 1            

Option 2

#app/models/base_aggregates_redshift_test.rb
class BaseAggregatesRedshiftTest < ActiveRecord::Base
  set_table "beta.base_aggregates_v2"

  ActiveRecord::Base.establish_connection(
    adapter: 'redshift',
    encoding: 'utf8',
    database: 'staging',
    port: '5439',
    pool: '5',
    username: 'admin',
    password: 'supersecretpassword',
    search_schema: 'beta',
    host: 'db03.myremotehost.us',
    timeout: '5000'
  )

end

#in console, abbreviated example of first record, now it's using the new name for my redshift table, just assuming I've got the record at base_aggregates_redshift_tests because I didn't set the table_name

[DEV] main:0> redshift = BaseAggregatesRedshiftTest.first
D, [2014-07-17T15:09:39.388918 #11537] DEBUG -- :   BaseAggregatesRedshiftTest Load (45.3ms)  SELECT "base_aggregates_redshift_tests".* FROM "base_aggregates_redshift_tests" LIMIT 1
#<BaseAggregatesRedshiftTest:0x007fd8c4a12580> {
                                                :truncated_month => Thu, 31 Jan 2013 19:00:00 EST -05:00,
                                                :dma => "Cityville",
                                                :group_id => 9712338,
                                                :dma_id => 9999 
                                                }

Good luck @johncorser!

like image 189
harrisjb Avatar answered Oct 26 '22 22:10

harrisjb


This tutorial helps you setup a rails app with a redshift adapter:

https://www.credible.com/code/setting-up-a-data-warehouse-with-aws-redshift-and-ruby/

In a nutshell:

Clone the sample app:

git clone [email protected]:tuesy/redshift-ruby-tutorial.git
cd redshift-ruby-tutorial

Setup ENV Variables via ~/.bashrc (or dotenv):

export REDSHIFT_HOST=redshift-ruby-tutorial.ccmj2nxbsay7.us-east-1.redshift.amazonaws.com
export REDSHIFT_PORT=5439
export REDSHIFT_USER=deploy
export REDSHIFT_PASSWORD=<your password here>
export REDSHIFT_DATABASE=analytics
export REDSHIFT_BUCKET=redshift-ruby-tutorial

Use the gem activerecord4-redshift-adapter, in Gemfile:

'activerecord4-redshift-adapter', '~> 0.2.0' # For Rails 4.2
'activerecord4-redshift-adapter', '~> 0.1.1' # For Rails 4.1

Then you can query into redshift like you would with a normal AR model:

bundle exec rails c
RedshiftUser.count

(Disclosure: I havn't yet given this method a try, but I may soon)

like image 38
Benjamin Crouzier Avatar answered Oct 27 '22 00:10

Benjamin Crouzier