Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex to split line (csv file)

Tags:

c#

.net

regex

csv

I am not good in regex. Can some one help me out to write regex for me?

I may have values like this while reading csv file.

"Artist,Name",Album,12-SCS
"val""u,e1",value2,value3

Output:

Artist,Name  
Album
12-SCS
Val"u,e1 
Value2 
Value3

Update: I like idea using Oledb provider. We do have file upload control on the web page, that I read the content of the file using stream reader without actual saving file on the file system. Is there any way I can user Oledb provider because we need to specify the file name in connection string and in my case i don't have file saved on file system.

like image 849
shailesh Avatar asked Jul 16 '10 20:07

shailesh


3 Answers

Just adding the solution I worked on this morning.

var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");

foreach (Match m in regex.Matches("<-- input line -->"))
{
    var s = m.Value; 
}

As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.

This is still a work in progress, but it happily parses CSV strings like:

2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D
like image 60
Joshua Avatar answered Nov 16 '22 01:11

Joshua


Actually, its pretty easy to match CVS lines with a regex. Try this one out:

StringCollection resultList = new StringCollection();
try {
    Regex pattern = new Regex(@"
        # Parse CVS line. Capture next value in named group: 'val'
        \s*                      # Ignore leading whitespace.
        (?:                      # Group of value alternatives.
          ""                     # Either a double quoted string,
          (?<val>                # Capture contents between quotes.
            [^""]*(""""[^""]*)*  # Zero or more non-quotes, allowing 
          )                      # doubled "" quotes within string.
          ""\s*                  # Ignore whitespace following quote.
        |  (?<val>[^,]*)         # Or... zero or more non-commas.
        )                        # End value alternatives group.
        (?:,|$)                  # Match end is comma or EOS", 
        RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
    Match matchResult = pattern.Match(subjectString);
    while (matchResult.Success) {
        resultList.Add(matchResult.Groups["val"].Value);
        matchResult = matchResult.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Disclaimer: The regex has been tested in RegexBuddy, (which generated this snippet), and it correctly matches the OP test data, but the C# code logic is untested. (I don't have access to C# tools.)

like image 9
ridgerunner Avatar answered Nov 16 '22 01:11

ridgerunner


Regex is not the suitable tool for this. Use a CSV parser. Either the builtin one or a 3rd party one.

like image 6
BalusC Avatar answered Nov 16 '22 02:11

BalusC