Read AWS S3 CSV Column Names in Lambda

Question

I am trying to write a script that collects the schema from an AWS Aurora Serverless MySQL database table, collects the column headers from a CSV file stored in an AWS S3 bucket, and only writes the CSV to the table if its column headers are a subset of the schema (e.g., if the table fields are ['Name', 'DOB', 'Height'] but the CSV fields are ['Name', 'DOB', 'Weight'] the script will throw an exception.

Thus far, I have successfully returned the table schema in an AWS Lambda function, and successfully read in the CSV file, but I am not sure how to get the column headers from the S3 object.

def return_db_schema(event):
    schema = []
    conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
    with conn.cursor() as cur:
        cur.execute('SHOW columns FROM SampleTable')
        conn.commit()
        cur.close()
        for row in cur:
            schema.append(list(row)[0])
        return schema

def return_csv_cols(event):
    s3 = boto3.client('s3')
    tester = s3.get_object(Bucket=s3_bucket, Key=test_key)
    contents = tester['Body'].read()

def main(event, context):
    print(return_db_schema(event))
    print()
    print(return_csv_cols(event))

I am unsure how to proceed from here (e.g., is there a way to do this without loading the CSV to a pandas DataFrame and calling df.columns() or something like that?).

OJT · Accepted Answer

I have solved this with the following code:

    s3 = boto3.client('s3')
    tester = s3.get_object(Bucket=s3_bucket, Key=test_key)
    contents = tester['Body'].read().decode('UTF-8')
    cols = contents.split('
')[0].split(',')
    return cols, contents

Read AWS S3 CSV Column Names in Lambda

Tags:

python

csv

amazon-s3

aws-lambda

boto3

OJT

1 Answers

OJT

Recent Activity

Donate For Us

Read AWS S3 CSV Column Names in Lambda

Tags:

python

csv

amazon-s3

aws-lambda

boto3

OJT

1 Answers

OJT

Related questions

Recent Activity

Donate For Us