Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lambda using python 3.6 & boto3 in VPC times out when connecting to Redshift

I am trying to use boto3 in python3.6 to connect to my Redshift cluster using the get_cluster_credentials API. The following code times out 100% of the time when the Lambda function is added to the VPC. It runs without issue when Lambda is not added to the VPC.

I can't figure out if get_cluster_credentials uses the public or private IP to access Redshift. I also can't figure out if there is a way to force it to use one or the other.

import json
import boto3

def lambda_handler(event, context):
    redshiftClient = boto3.client('redshift', region_name='us-east-1')
    cluster_creds = redshiftClient.get_cluster_credentials( DbUser='awsuser',
                                                            DbName='dev',
                                                            ClusterIdentifier='redshift-cluster-1',
                                                            AutoCreate=False)
    print(cluster_creds)

    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

My configuration is very simple. The NACL lets everything (0.0.0.0/0) through on all ports and protocols. MY SG does the same thing.

I have 1 internet gateway defined: igw-0d1e6dcbfdea792b2

I have 1 subnet and 1 routing table in the VPC. The routing table has one rule to map 0.0.0.0/0 --> igw-0d1e6dcbfdea792b2.

I am able to connect from outside AWS to the cluster using SQL Workbench/J without issue.

I have looked at many posts, threads and documents, but cannot figure out what is happening:

AWS Lambda times out connecting to RedShift

Connect Lambda to Redshift in Different Availability Zones

https://github.com/awslabs/aws-lambda-redshift-loader/issues/86

Accessing Redshift from Lambda - Avoiding the 0.0.0.0/0 Security Group

https://aws.amazon.com/blogs/big-data/a-zero-administration-amazon-redshift-database-loader/

Conecting AWS Lambda to Redshift - Times out after 60 seconds

Please help.

Thanks a lot.

like image 622
Garet Jax Avatar asked Apr 21 '26 23:04

Garet Jax


2 Answers

As per your other question, when an AWS Lambda function is added to a VPC, it does not receive a Public IP address. Therefore, if the function wishes to access the Internet (in this case to make the get_cluster_credentials() call), you should:

  • Add a NAT Gateway in a Public subnet
  • Attach the Lambda function to a Private subnet
  • Set routing on the private subnet to use the NAT Gateway for 0.0.0.0/0

It will not work if you have only one subnet, since the Lambda function will not be able to access the NAT Gateway.

I have also had success manually assigning an Elastic IP address to the Lambda function's ENI (instead of using a NAT Gateway), but this will not scale because Lambda might deploy additional containers and therefore additional ENIs. It might be sufficient if the function runs rarely and never concurrently.

like image 105
John Rotenstein Avatar answered Apr 25 '26 01:04

John Rotenstein


You should be able to connect to RedShift directly from the VPC without an Internet or NAT gateway. This is what AWS PrivateLink is for and RedShift is supported.

A generic description of the process (service specific variations apply):

  • Go to VPC -> Endpoints in AWS console
  • Create a new endpoint
  • Select which service you want to create the endpoint for
    • configure endpoint security group etc.

Now, in your code when you create the client, you need to define the region and the endpoint for the client.

Disclaimer: I've not done this for RedShift, but I have done it for STS and it works.

  • Creating an interface endpoint docs

  • docs for RedShift specifically

  • list of resources that support AWS PrivateLink

like image 31
Matti Lyra Avatar answered Apr 24 '26 23:04

Matti Lyra