I am trying to write a script that collects the schema from an AWS Aurora Serverless MySQL database table, collects the column headers from a CSV file stored in an AWS S3 bucket, and only writes the CSV to the table if its column headers are a subset of the schema (e.g., if the table fields are ['Name', 'DOB', 'Height'] but the CSV fields are ['Name', 'DOB', 'Weight'] the script will throw an exception.
Thus far, I have successfully returned the table schema in an AWS Lambda function, and successfully read in the CSV file, but I am not sure how to get the column headers from the S3 object.
def return_db_schema(event):
schema = []
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
with conn.cursor() as cur:
cur.execute('SHOW columns FROM SampleTable')
conn.commit()
cur.close()
for row in cur:
schema.append(list(row)[0])
return schema
def return_csv_cols(event):
s3 = boto3.client('s3')
tester = s3.get_object(Bucket=s3_bucket, Key=test_key)
contents = tester['Body'].read()
def main(event, context):
print(return_db_schema(event))
print()
print(return_csv_cols(event))
I am unsure how to proceed from here (e.g., is there a way to do this without loading the CSV to a pandas DataFrame and calling df.columns() or something like that?).
I have solved this with the following code:
s3 = boto3.client('s3')
tester = s3.get_object(Bucket=s3_bucket, Key=test_key)
contents = tester['Body'].read().decode('UTF-8')
cols = contents.split('\n')[0].split(',')
return cols, contents
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With