I am importing an excel file into a postgres database with express and node-pg
Currently I am looping through the excel rows and executing an insert for every row but I feel it's not the right way:
workbook.xlsx.readFile(excel_file).then(function () {
// get the first worksheet
var worksheet = workbook.getWorksheet(1);
// Loop through all rows
worksheet.eachRow(function (row, rowNumber) {
// Commit to DB only from line 2 and up. We want to exclude headers from excel file
if (rowNumber > 1) {
// Loop through all values and build array to pass to DB function
row.eachCell(function (cell, colNumber) {
arrSQLParams.push(cell.value)
})
// Add the user id from session to the array
arrSQLParams.push(user);
// Insert into DB
db.query(strSQL, arrSQLParams, function (err, result) {
if (err) {
console.log(err);
ret = false;
}
})
// Empty the array for new query
arrSQLParams = [];
}
})
});
Is there a better way to do this to improve performance?
We recommend using the PostgreSQL COPY command to load data from one or more files. COPY is optimized for bulk data loads. It's more efficient than running a large number of INSERT statements or even multi-valued INSERTS.
PostgreSQL INSERT Multiple Rows First, specify the name of the table that you want to insert data after the INSERT INTO keywords. Second, list the required columns or all columns of the table in parentheses that follow the table name. Third, supply a comma-separated list of rows after the VALUES keyword.
Simple answer is: It's not possible with the libraries. The underlying Postgres COPY protocol only supports inserts. In SQL Server I've always used Table-valued Parameters (TVP) to send a batch of data over the wire and perform a MERGE statement on the bulk data.
At 200 million rows the insert rate in PostgreSQL is an average of 30K rows per second and only gets worse; at 1 billion rows, it's averaging 5K rows per second. On the other hand, TimescaleDB sustains an average insert rate of 111K rows per second through 1 billion rows of data–a 20x improvement.
Following the clarification provided by the author, to insert up to 1000 records at a time, the solution as suggested within Multi-row insert with pg-promise is exactly what the author needs, in terms of both performance and flexibility.
UPDATE
A must-read article: Data Imports.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With