I have the below code where I am reading from a CSV and writing to another CSV. I will be transforming some data before writing to another file, but as a test, I ran the code and see that there are slight differences between source and destination files without event changing anything about the files.
for(const m of metadata) {
tempm = m;
fname = path;
const pipelineAsync = promisify(pipeline);
if(m.path) {
await pipelineAsync(
fs.createReadStream(m.path),
csv.parse({delimiter: '\t', columns: true}),
csv.transform((input) => {
return Object.assign({}, input);
}),
csv.stringify({header: true, delimiter: '\t'}),
fs.createWriteStream(fname, {encoding: 'utf16le'})
)
let nstats = fs.statSync(fname);
tempm['transformedPath'] = fname;
tempm['transformed'] = true;
tempm['t_size_bytes'] = nstats.size;
}
}
I see that for example,
file a: the source file size is `895631` while after copying destination file size is `898545`
file b: the source file size is `51388` while after copying destination file size is `52161`
file c: the source file size is `13666` while after copying destination file size is `13587`
But when i do not use tranform, the sizes match, for example this code produces excatly same file sizes on both source and dest
for(const m of metadata) {
tempm = m;
fname = path;
const pipelineAsync = promisify(pipeline);
if(m.path) {
await pipelineAsync(
fs.createReadStream(m.path),
/*csv.parse({delimiter: '\t', columns: true}),
csv.transform((input) => {
return Object.assign({}, input);
}),
csv.stringify({header: true, delimiter: '\t'}),*/
fs.createWriteStream(fname, {encoding: 'utf16le'})
)
let nstats = fs.statSync(fname);
tempm['transformedPath'] = fname;
tempm['transformed'] = true;
tempm['t_size_bytes'] = nstats.size;
}
}
Can any one please help in identifying what options i need to pass to csv transformation, so that the copy happens correctly.
I am doing this test to ensure, i am not losing out any data in large files.
Thanks.
Update 1: I have also checked that the encoding on both the files is same.
Update 2: I notice that the the source file has CRLF and destination file has LF. Is there a way i can keep the same using node.js or is it something OS dependent.
Update 3: Looks like the issue is EOL, I see the source file has CRLF while the destination file / transformed file has LF. I need to now find a way to specify this my above code so that the EOL is consistent
You need to setup you EOL config:
const { pipeline } = require('stream')
const { promisify } = require('util')
const fs = require('fs')
const csv = require('csv')
const os = require('os')
;(async function () {
const pipelineAsync = promisify(pipeline)
await pipelineAsync(
fs.createReadStream('out'),
csv.parse({ delimiter: ',', columns: true }),
csv.transform((input) => {
return Object.assign({}, input)
}),
// Here the trick:
csv.stringify({ eol: true, record_delimiter: os.EOL, header: true, delimiter: '\t' }),
fs.createWriteStream('out2', { encoding: 'utf16le' })
)
})()
You can use \r\n as well or whatever you need $-new-line\n
This setup can be spotted reading the source code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With