Read and parse CSV file in S3 without downloading the entire file

Using node.js, with the intention of running this module as an AWS Lambda function.

Using s3.getObject() from aws-sdk, I am able to successfully pick up a very large CSV file from Amazon S3. The intention is to read each line in the file and emit an event with the body of each line.

In all examples I could find, it looks like the entire CSV file in S3 has to be buffered or streamed, converted to a string and then read line by line.

s3.getObject(params, function(err, data) {
   var body = data.Body.toString('utf-8');
}

This operation takes a very long time, given the size of the source CSV file. Also, the CSV rows are of varying length, and I'm not certain if I can use the buffer size as an option.

Question

Is there a way to pick up the S3 file in node.js and read/transform it line by line, which avoids stringifying the entire file in memory first?

Ideally, I'd prefer to use the better capabilities of fast-csv and/or node-csv, instead of looping manually.

Can I read S3 file without downloading?

Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).

You should just be able to use the createReadStream method and pipe it into fast-csv:

const s3Stream = s3.getObject(params).createReadStream()
require('fast-csv').fromStream(s3Stream)
  .on('data', (data) => {
    // do something here
  })

I do not have enough reputation to comment but as of now the accepted answer method "fromStream" does not exist for 'fast-csv'. Now you'll need to use the parseStream method:

const s3Stream = s3.getObject(params).createReadStream()
require('fast-csv').parseStream(s3Stream)
  .on('data', (data) => {
    // use rows
  })

Read and parse CSV file in S3 without downloading the entire file

Tags:

node.js

amazon-web-services

amazon-s3

aws-sdk

changingrainbows

People also ask

2 Answers

idbehold

Kai Durai

Recent Activity

Donate For Us

Read and parse CSV file in S3 without downloading the entire file

Tags:

node.js

amazon-web-services

amazon-s3

aws-sdk

changingrainbows

People also ask

2 Answers

idbehold

Kai Durai

Related questions

Recent Activity

Donate For Us