Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I import bulk data from a CSV file into DynamoDB?

I am trying to import a CSV file data into AWS DynamoDB.

Here's what my CSV file looks like:

first_name  last_name sri ram Rahul   Dravid JetPay  Underwriter Anil Kumar  Gurram 
like image 836
Hemanth Kumar Avatar asked Sep 20 '15 10:09

Hemanth Kumar


People also ask

How do you import data from Excel to DynamoDB?

For this, you can store your excel file to S3 bucket. then create a AWS Lambda function to read your excel file from S3 bucket and create correct response format as your DynamoDB structure.


Video Answer


2 Answers

In which language do you want to import the data? I just wrote a function in Node.js that can import a CSV file into a DynamoDB table. It first parses the whole CSV into an array, splits array into (25) chunks and then batchWriteItem into table.

Note: DynamoDB only allows writing up to 25 records at a time in batchinsert. So we have to split our array into chunks.

    var fs = require('fs');     var parse = require('csv-parse');     var async = require('async');      var csv_filename = "YOUR_CSV_FILENAME_WITH_ABSOLUTE_PATH";      rs = fs.createReadStream(csv_filename);     parser = parse({         columns : true,         delimiter : ','     }, function(err, data) {          var split_arrays = [], size = 25;          while (data.length > 0) {             split_arrays.push(data.splice(0, size));         }         data_imported = false;         chunk_no = 1;          async.each(split_arrays, function(item_data, callback) {             ddb.batchWriteItem({                 "TABLE_NAME" : item_data             }, {}, function(err, res, cap) {                 console.log('done going next');                 if (err == null) {                     console.log('Success chunk #' + chunk_no);                     data_imported = true;                 } else {                     console.log(err);                     console.log('Fail chunk #' + chunk_no);                     data_imported = false;                 }                 chunk_no++;                 callback();             });          }, function() {             // run after loops             console.log('all data imported....');          });      });     rs.pipe(parser); 
like image 78
Hassan Siddique Avatar answered Oct 03 '22 10:10

Hassan Siddique


Updated 2019 Javascript code

I didn't have much luck with any of the Javascript code samples above. Starting with Hassan Siddique answer above, I've updated to the latest API, included sample credential code, moved all user config to the top, added uuid()'s when missing and stripped out blank strings.

const fs = require('fs'); const parse = require('csv-parse'); const async = require('async'); const uuid = require('uuid/v4'); const AWS = require('aws-sdk');  // --- start user config ---  const AWS_CREDENTIALS_PROFILE = 'serverless-admin'; const CSV_FILENAME = "./majou.csv"; const DYNAMODB_REGION = 'eu-central-1'; const DYNAMODB_TABLENAME = 'entriesTable';  // --- end user config ---  const credentials = new AWS.SharedIniFileCredentials({   profile: AWS_CREDENTIALS_PROFILE }); AWS.config.credentials = credentials; const docClient = new AWS.DynamoDB.DocumentClient({   region: DYNAMODB_REGION });  const rs = fs.createReadStream(CSV_FILENAME); const parser = parse({   columns: true,   delimiter: ',' }, function(err, data) {    var split_arrays = [],     size = 25;    while (data.length > 0) {     split_arrays.push(data.splice(0, size));   }   data_imported = false;   chunk_no = 1;    async.each(split_arrays, function(item_data, callback) {     const params = {       RequestItems: {}     };     params.RequestItems[DYNAMODB_TABLENAME] = [];     item_data.forEach(item => {       for (key of Object.keys(item)) {         // An AttributeValue may not contain an empty string         if (item[key] === '')           delete item[key];       }        params.RequestItems[DYNAMODB_TABLENAME].push({         PutRequest: {           Item: {             id: uuid(),             ...item           }         }       });     });      docClient.batchWrite(params, function(err, res, cap) {       console.log('done going next');       if (err == null) {         console.log('Success chunk #' + chunk_no);         data_imported = true;       } else {         console.log(err);         console.log('Fail chunk #' + chunk_no);         data_imported = false;       }       chunk_no++;       callback();     });    }, function() {     // run after loops     console.log('all data imported....');    });  }); rs.pipe(parser); 
like image 22
gadicc Avatar answered Oct 03 '22 12:10

gadicc