Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A way to export psql table (or query) directly to AWS S3 as file (csv, json)

This answer suggests using AWS Data Pipeline but I'm wondering if there's a clever way to do it with my own machine and Python.

I've been using psycopg2, boto3 and pandas libraries. Tables have 5 to 50 columns and few million rows. My current method doesn't work that well with large data.

like image 980
Tsingis Avatar asked Mar 06 '23 21:03

Tsingis


2 Answers

Guess I can show one of my own versions here aswell which is based on copy_expert in psycopg2

import io
import psycopg2
import boto3

resource = boto3.resource('s3')
conn = psycopg2.connect(dbname=db, user=user, password=pw, host=host)
cur = conn.cursor()

def copyFun(bucket, select_query, filename):
    query = f"""COPY {select_query} TO STDIN \
            WITH (FORMAT csv, DELIMITER ',', QUOTE '"', HEADER TRUE)"""
    file = io.StringIO()
    cur.copy_expert(query, file)
    resource.Object(bucket, f'{filename}.csv').put(Body=file.getvalue())
like image 135
Tsingis Avatar answered Mar 12 '23 15:03

Tsingis


We do following in our case, performance wise, its pretty fast, and scheduled method rather then continuous streaming. I'm not 100% sure if its wise method, but definitely good from speed prospective in case of scheduled data exports in CSV format that we eventually use for loading to d/w.

  • Using shell script, we fire psql command to copy data to local file in EC2 App intance.

    psql [your connection options go here] -F, -A -c 'select * from my_schema.example' >example.csv

  • Then, using shell script, we fire s3cmd command to Put example.csv to designated S3:bucket Location.

    s3cmd put example.csv s3://your-bucket/path/to/file/

like image 45
Red Boy Avatar answered Mar 12 '23 15:03

Red Boy