Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to read accented characters from csv file stream in node

To start off. I am currently using npm fast-csv which is a nice CSV reader/writer that is pretty straightforward and simple. What Im attempting to do is use this in conjunction with iconv to process "accented" character and non-ASCII characters and either convert them to an ASCII equivalent or remove them depending on the character.

My current process Im doing with fast-csv is to bring in a chunk for processing (comes in as one row) via a read stream, pause the read stream, process the data, pipe the data to a write stream and then resume the read stream using a callback. Fast-csv currently knows where to separate the chunks based on the format of the data coming in from the readstream.

The entire process looks like this:

var stream = fs.createReadStream(inputFileName);
function csvPull(source) {
    csvWrite = csv.createWriteStream({ headers: true });
    writableStream = fs.createWriteStream(outputFileName);
    csvStream = csv()
        .on("data", function (data) {
            csvStream.pause();
            processRow(data, function () {
                csvStream.resume();
            });
        })
        .on("end", function () {
            console.log('END OF CSV FILE');
        });
    csvWrite.pipe(writableStream);
    source.pipe(csvStream);
}
csvPull(stream);

The problem I am currently running into is that Im noticing that for some reason, when my javascript compiles, it does not inherently recognise non-ASCII characters, so I am resorting to having to use npm iconv-lite to encode the data stream as it comes in to something usable. However, this presents a bigger issue as fast-csv will no longer know where to split the chunks (rows) due to the now encoded data. This is a problem due to the sizes of the CSVs I will be working with; it will not be an option to load the entire CSV into the buffer to then decode.

Are there any suggestions on how I might get around this without writing my own CSV parser into my code?

like image 566
JSArrakis Avatar asked Oct 21 '15 15:10

JSArrakis


1 Answers

Try reading your file with binary for the encoding option. I had to read few csv with some accented characters and it worked fine with that.

var stream = fs.createReadStream(inputFileName, { encoding: 'binary' });
like image 194
Shanoor Avatar answered Nov 15 '22 19:11

Shanoor