Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apps Script Utilities.parseCsv assumes new row on line breaks within double quotes

When using Utilities.parseCsv() linebreaks encased inside double quotes are assumed to be new rows entirely. The output array from this function will have several incorrect rows.

How can I fix this, or work around it?

Edit: Specifically, can I escape line breaks that exist only within double quotes? ie.

/r/n "I have some stuff to do:/r/n Go home/r/n Take a Nap"/r/n

Would be escaped to:

/r/n "I have some stuff to do://r//n Go home//r//n Take a Nap"/r/n

Edit2: Bug report from 2012: https://code.google.com/p/google-apps-script-issues/issues/detail?id=1871

like image 564
Douglas Gaskell Avatar asked Apr 16 '16 00:04

Douglas Gaskell


1 Answers

So I had a somewhat large csv file about 10MB 50k rows, which contained a field at the end of each row with comments that users enter with all sorts of characters inside. I found the proposed regex solution was working when I tested a small set of the rows, but when I threw the big file to it, there was an error again and after trying a few things with the regex I even got to crash the whole runtime.

BTW I'm running my code on the V8 runtime.

After scratching my head for about an hour and with not really helpful error messages from AppsSript runtime. I had an idea, what if some weird users where deciding to use back-slashes in some weird ways making some escapes go wrong. So I tried replacing all back-slashes in my data with something else for a while until I had the array that parseCsv() returns. It worked! My hypothesis is that having a \ at the end of lines was breaking the replacement.

So my final solution is:

function testParse() {
    let csv =
        '"title1","title2","title3"\r\n' +
        '1,"person1","A ""comment"" with a \\ and \\\r\n a second line"\r\n' +
        '2,"person2","Another comment"';

    let sanitizedString =
        csv.replace(/\\/g, '::back-slash::')
            .replace(/(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]\r?\n(?:\\[\s\S][^'\\]\r?\n)*')/g,
                match => match.replace(/\r?\n/g, "::newline::"));
    let arr = Utilities.parseCsv(sanitizedString);
    for (let i = 0, rows = arr.length; i < rows; i++) {
        for (let j = 0, cols = arr[i].length; j < cols; j++) {
            arr[i][j] = 
                arr[i][j].replace(/::back-slash::/g,'\\')
                    .replace(/::newline::/g,'\r\n');

        }
    }
    Logger.log(arr)
}

Output:

[20-02-18 11:29:03:980 CST] [[title1, title2, title3], [1, person1, A "comment" with a \ and \
 a second line], [2, person2, Another comment]]
like image 143
J. García Avatar answered Sep 18 '22 17:09

J. García