Using node.js, with the intention of running this module as an AWS Lambda function.
Using s3.getObject()
from aws-sdk
, I am able to successfully pick up a very large CSV file from Amazon S3. The intention is to read each line in the file and emit an event with the body of each line.
In all examples I could find, it looks like the entire CSV file in S3 has to be buffered or streamed, converted to a string and then read line by line.
s3.getObject(params, function(err, data) {
var body = data.Body.toString('utf-8');
}
This operation takes a very long time, given the size of the source CSV file. Also, the CSV rows are of varying length, and I'm not certain if I can use the buffer size as an option.
Question
Is there a way to pick up the S3 file in node.js and read/transform it line by line, which avoids stringifying the entire file in memory first?
Ideally, I'd prefer to use the better capabilities of fast-csv
and/or node-csv
, instead of looping manually.
Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).
You should just be able to use the createReadStream
method and pipe it into fast-csv:
const s3Stream = s3.getObject(params).createReadStream()
require('fast-csv').fromStream(s3Stream)
.on('data', (data) => {
// do something here
})
I do not have enough reputation to comment but as of now the accepted answer method "fromStream" does not exist for 'fast-csv'. Now you'll need to use the parseStream method:
const s3Stream = s3.getObject(params).createReadStream()
require('fast-csv').parseStream(s3Stream)
.on('data', (data) => {
// use rows
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With