Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV separator auto-detection in Javascript

How can I detect the CSV separator from a string in Javascript/NodeJS?

Which is the standard algorithm?

Note that the separator is not a comma always. The most common separators being ;, , and \t (tab).

like image 398
Ionică Bizău Avatar asked Sep 27 '13 14:09

Ionică Bizău


Video Answer


1 Answers

A possible algorithm for getting the likely separator(s) is pretty simple, and assumes the data is well-formed:

  1. For every delimiter,
    1. For every line,
      1. Split the line by the delimiter, check the length.
      2. If its length is not equal to the last line's length, this is not a valid delimiter.

Proof of concept (doesn't handle quoted fields):

function guessDelimiters (text, possibleDelimiters) {
    return possibleDelimiters.filter(weedOut);

    function weedOut (delimiter) {
        var cache = -1;
        return text.split('\n').every(checkLength);

        function checkLength (line) {
            if (!line) {
                return true;
            }

            var length = line.split(delimiter).length;
            if (cache < 0) {
                cache = length;
            }
            return cache === length && length > 1;
        }
    }
}

The length > 1 check is to make sure the split didn't just return the whole line. Note that this returns an array of possible delimiters - if there's more than one item, you have an ambiguity problem.

like image 105
Zirak Avatar answered Oct 10 '22 20:10

Zirak