Currently I am using Amazon Redshift as well as Amazon S3 to store data. Now I want to use Spectrum to improve performance but confused in how to use it properly. If I am using SQL workbench can I create external schema from same or I need to create it from AWS console or Athena.? Do I need to have Athena for a specific region.? Is it possible to use spectrum without Athena.? Now if I try to create external schema through SQL workbench it was throwing an error "CREATE EXTERNAL SCHEMA is not enabled" How can enable this..? Please help if someone had used Spectrum and let me know detailed steps to use spectrum.

Redshift Spectrum requires an external data catalog that contains the definition of the table. It is this data catalog that contains the reference to the files in S3, rather than the external table definition in Redshift. This data catalog can be defined in Elastic MapReduce as a Hive Catalog (good if you have an existing EMR deployment) or in Athena (good if you don't have EMR or don't want to get into managing Hadoop). The Athena route can be managed fully by Redshift, if you wish. It looks to me like your issue is one of four things. Either: <ol> <li>Your Redshift cluster is not in an AWS region that currently supports Athena and Spectrum.</li> <li>Your Redshift cluster version doesn't support Spectrum yet (1.0.1294 or later).</li> <li>Your IAM policies don't allow Redshift control over Athena.</li> <li>You're not using the <code>CREATE EXTERNAL DATABASE IF NOT EXISTS</code> parameter on your <code>CREATE EXTERNAL SCHEMA</code> statement.</li> </ol> To allow Redshift to manage Athena you'll need to attach an IAM policy to your Redshift cluster that allows it Full Control over Athena, as well as Read access to the S3 bucket containing your data. Once that's in place, you can create your external schema as you have been already, ensuring that the <code>CREATE EXTERNAL DATABASE IF NOT EXISTS</code> argument is also passed. This makes sure that the external database is created in Athena if you don't have a pre-existing configuration: http://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-create-external-table.html Finally, run your <code>CREATE EXTERNAL TABLE</code> statement, which will transparently create the table metadata in the Athena data catalog: http://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html

What are the steps to use Redshift Spectrum.?

1 Answers

Redshift Spectrum requires an external data catalog that contains the definition of the table. It is this data catalog that contains the reference to the files in S3, rather than the external table definition in Redshift. This data catalog can be defined in Elastic MapReduce as a Hive Catalog (good if you have an existing EMR deployment) or in Athena (good if you don't have EMR or don't want to get into managing Hadoop). The Athena route can be managed fully by Redshift, if you wish.

It looks to me like your issue is one of four things. Either:

Your Redshift cluster is not in an AWS region that currently supports Athena and Spectrum.
Your Redshift cluster version doesn't support Spectrum yet (1.0.1294 or later).
Your IAM policies don't allow Redshift control over Athena.
You're not using the CREATE EXTERNAL DATABASE IF NOT EXISTS parameter on your CREATE EXTERNAL SCHEMA statement.

To allow Redshift to manage Athena you'll need to attach an IAM policy to your Redshift cluster that allows it Full Control over Athena, as well as Read access to the S3 bucket containing your data.

Once that's in place, you can create your external schema as you have been already, ensuring that the CREATE EXTERNAL DATABASE IF NOT EXISTS argument is also passed. This makes sure that the external database is created in Athena if you don't have a pre-existing configuration: http://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-create-external-table.html

Finally, run your CREATE EXTERNAL TABLE statement, which will transparently create the table metadata in the Athena data catalog: http://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html

117

answered Nov 15 '22 04:11

GShenanigan

Related questions
                            
                                AWS DynamoDB query not filtering on BOOL value
                            
                                How to enable cloudwatch logs and assign custom domain name in cloudformation
                            
                                Elastic beanstalk vs ECS for multi container docker
                            
                                Creating a fifo queue in SQS using boto3
                            
                                Amazon Rekognition API - S3 MetaData Issue
                            
                                aws boto3 grab subnet info
                            
                                What will happen after the maximum number of images pushed to ECR repository
                            
                                Determine if instance is a part of some AutoScaling Group in AWS
                            
                                Add custom header to Amazon AWS ELB response
                            
                                Are triggers lost when new lambda instances cannot be spawned?
                            
                                PGAdmin III cannot connect AWS RDS
                            
                                How to execute scheduled SQL script on Amazon Redshift?
                            
                                Building and pushing docker image from Gitlab-CI to Amazon AWS ECR
                            
                                aws apigateway lambda always return 502
                            
                                AWS Lambda Performance issues
                            
                                Create copy of EC2 instance and launch as new instances
                            
                                AWS Certificate Manager for ELB pointing to a Apache Server Running on EC2
                            
                                Amazon Postgres RDS pg_stat_statements not loaded
                            
                                How to get amazon affiliate product Url programatically
                            
                                boto3 can't delete AWS tags

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the steps to use Redshift Spectrum.?

Tags:

amazon-web-services

amazon-s3

amazon-redshift

amazon-redshift-spectrum

Pratik Rawlekar

People also ask

1 Answers

GShenanigan

Recent Activity

Donate For Us