Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transfer Postgres Data from Query to S3 Efficiently

My node.js app service currently hits postgres using a provided select query, converts it to csv, and then uploads that CSV file to S3.

I'm wondering if there is a better approach that works more efficiently for larger data pulls?

like image 353
awhipp Avatar asked Aug 07 '20 14:08

awhipp


People also ask

Why is Aurora faster than PostgreSQL?

Instead, the storage layer takes care of applying redo logs when the pages are read. Thus, the compute and storage can now scale out independently, delivering great performance. Because of such unique architecture Amazon claims that Aurora can deliver 5x and 3x faster than MySQL and PostgreSQL databases, respectively.

What is the largest size file you can transfer to S3 using a put operation?

Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.


1 Answers

This should get you ~90% there. I haven't tested this exact implementation and there may be a typo or two, but I have similar code running in production right now.

const { Transform } = require('json2csv');
const { Client, Query } = require('pg')
const { S3 } = require('aws-sdk');
const { Passthrough } = require('stream')

const client = new Client()
const s3 = new S3({ region: 'us-east-1' });


const opts = { fields: ['field1', 'field2', 'field3'] };
const transformOpts = { highWaterMark: 8192, encoding: 'utf-8', objectMode: true };

const transform = new Transform(opts, transformOpts);
const passthrough = new Passthrough();
transform.pipe(passthrough)


client.connect()

const query = new Query('SELECT field1, field2, field3 FROM table')
client.query(query)

query.on('row', row => {
  transform.push(row);
  console.log('row!', row) // { field1: 1, field2: 2, field3: 3 }
})
query.on('end', () => {
  transform.push(null)
  console.log('query done')
})
query.on('error', err => {
  transform.end();
  console.error(err.stack)
})

s3.upload({ Body: passthrough, Key: 'somefile.csv', Bucket: 'some_bucket' })
.send((err, data) => {
  if (err) {
    console.error({ err });
    passthrough.destroy(err);
  } else {
    console.log(`File uploaded and available at ${data.Location}`);
    passthrough.destroy();
  }
});
like image 173
EddieDean Avatar answered Sep 29 '22 10:09

EddieDean