Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string containing CSV data with arbitrary text into a JavaScript Array of Arrays?

I have a long string containing CSV data from a file. I want to store it in a JavaScript Array of Arrays. But one column has arbitrary text in it. That text could contain double-quotes and commas.

Splitting the CSV string into separate row strings is no problem:

var theRows = theCsv.split(/\r?\n/);

But then how would I best split each row?

Since it's CSV data I need to split on commas. But

var theArray = new Array();
for (var i=0, i<theRows.length; i++) {
    theArray[i] = theRows[i].split(',');    
}

won't work for elements containing quotes and commas, like this example:

512,"""Fake News"" and the ""Best Way"" to deal with A, B, and C", 1/18/2019,media

How can I make sure that 2nd element gets properly stored in a single array element as

 "Fake News" and the "Best Way" to deal with A, B, and C

Thanks.

The suggested solution which looked similar unfortunately did not work when I tried the CSVtoArray function there. Instead of returning array elements, a null value was returned, as described in my comment below.

like image 699
Doug Lerner Avatar asked Feb 25 '19 06:02

Doug Lerner


People also ask

How do you split a string into an array in JavaScript?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.

How do you split an element in a string array?

You can simply use the String#split method on any element of the array, whose delimiter can be any character.

Can you split an array JavaScript?

Splitting the Array Into Even Chunks Using slice() Method The easiest way to extract a chunk of an array, or rather, to slice it up, is the slice() method: slice(start, end) - Returns a part of the invoked array, between the start and end indices.


1 Answers

This should do it:

let parseRow = function(row) {
  let isInQuotes = false;
  let values = [];
  let val = '';

  for (let i = 0; i < row.length; i++) {
    switch (row[i]) {
      case ',':
        if (isInQuotes) {
          val += row[i];
        } else {
          values.push(val);
          val = '';
        }
        break;

      case '"':
        if (isInQuotes && i + 1 < row.length && row[i+1] === '"') {
          val += '"'; 
          i++;
        } else {
          isInQuotes = !isInQuotes
        }
        break;

      default:
        val += row[i];
        break;
    }
  }

  values.push(val);

  return values;
}

It will return the values in an array:

parseRow('512,"""Fake News"" and the ""Best Way"" to deal with A, B, and C", 1/18/2019,media');
// => ['512', '"Fake News" and the "Best Way" to deal with A, B, and C', ' 1/18/2019', 'media']

To get the requested array of arrays you can do:

let parsedCsv = theCsv.split(/\r?\n/).map(parseRow);

Explanation

The code might look a little obscure. But the principal idea is as follows: We parse the string character by character. When we encounter a " we set isInQuotes = true. This will change the behavior for parsing ,and "". When we encounter a single " we set isInQuotes = false again.

like image 64
MaximeW Avatar answered Nov 04 '22 03:11

MaximeW