Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly split a CSV using C# split() function?

Tags:

c#

Suppose I have this CSV file :

NAME,ADDRESS,DATE
"Eko S. Wibowo", "Tamanan, Banguntapan, Bantul, DIY", "6/27/1979"

I would like like to store each token that enclosed using a double quotes to be in an array, is there a safe to do this instead of using the String split() function? Currently I load up the file in a RichTextBox, and then using its Lines[] property, I do a loop for each Lines[] element and doing this :

string[] line = s.Split(',');

s is a reference to RichTextBox.Lines[]. And as you can clearly see, the comma inside a token can easily messed up split() function. So, instead of ended with three token as I want it, I ended with 6 tokens

Any help will be appreciated!

like image 988
swdev Avatar asked Jun 20 '13 07:06

swdev


People also ask

How do I split a CSV file in command prompt?

In Terminal, navigate to the folder you just created using the 'cd' command, which stands for 'change directory. ' Now, you'll use the 'split' command to break the original file into smaller files.

How are CSV rows separated?

A CSV file contains a number of rows, each containing a number of columns, usually separated by commas.

How do I split CSV into test and train?

You should use the read_csv () function from the pandas module. It reads all your data straight into the dataframe which you can use further to break your data into train and test. Equally, you can use the train_test_split() function from the scikit-learn module.


2 Answers

You could use regex too:

string input = "\"Eko S. Wibowo\", \"Tamanan, Banguntapan, Bantul, DIY\", \"6/27/1979\"";
string pattern = @"""\s*,\s*""";

// input.Substring(1, input.Length - 2) removes the first and last " from the string
string[] tokens = System.Text.RegularExpressions.Regex.Split(
    input.Substring(1, input.Length - 2), pattern);

This will give you:

Eko S. Wibowo
Tamanan, Banguntapan, Bantul, DIY
6/27/1979
like image 150
unlimit Avatar answered Oct 05 '22 22:10

unlimit


I've done this with my own method. It simply counts the amout of " and ' characters.
Improve this to your needs.

    public List<string> SplitCsvLine(string s) {
        int i;
        int a = 0;
        int count = 0;
        List<string> str = new List<string>();
        for (i = 0; i < s.Length; i++) {
            switch (s[i]) {
                case ',':
                    if ((count & 1) == 0) {
                        str.Add(s.Substring(a, i - a));
                        a = i + 1;
                    }
                    break;
                case '"':
                case '\'': count++; break;
            }
        }
        str.Add(s.Substring(a));
        return str;
    }
like image 26
joe Avatar answered Oct 06 '22 00:10

joe