Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to split by comma + ignores comma within double quotes. VB.NET

Tags:

regex

vb.net

I'm trying to parse csv file with VB.NET.

csv files contains value like 0,"1,2,3",4 which splits in 5 instead of 3. There are many examples with other languages in Stockoverflow but I can't implement it in VB.NET. Here is my code so far but it doesn't work...

 Dim t As String() = Regex.Split(str(i), ",(?=([^\""]*\""[^\""]*\"")*[^\""]*$)")
like image 228
shinya Avatar asked Feb 07 '12 00:02

shinya


2 Answers

Assuming your csv is well-formed (ie no " besides those used to delimit string fields, or besides ones escaped like \"), you can split on a comma that's followed by an even number of non-escaped "-marks. (If you're inside a set of "" there's only an odd number left in the line).

Your regex you've tried looks like you're almost there.

The following looks for a comma followed by an even number of any sort of quote marks:

,(?=([^"]*"[^"]*")*[^"]*$)

To modify it to look for an even number of non-escaped quote marks (assuming quote marks are escaped with backslash like \"), I replace each [^"] with ([^"\\]|\\.). This means "match a character that isn't a " and isn't a blackslash, OR match a backslash and the character immediately following it".

,(?=(([^"\\]|\\.)*"([^"\\]|\\.)*")*([^"\\]|\\.)*$)

See it in action here. (The reason the backslash is doubled is I want to match a literal backslash).

Now to get it into vb.net you just need to double all your quote marks:

splitRegex = ",(?=(([^""\\]|\\.)*""([^""\\]|\\.)*"")*([^""\\]|\\.)*$)"
like image 150
mathematical.coffee Avatar answered Sep 22 '22 04:09

mathematical.coffee


Instead of a regular expression, try using the TextFieldParser class for reading .csv files. It handles your situation exactly.

TextFieldParserClass

Especially look at the HasFieldsEnclosedInQuotes property.

Example:

Note: I used a string instead of a file, but the result would be the same.

    Dim theString As String = "1,""2,3,4"",5"

    Using rdr As New StringReader(theString)
        Using parser As New TextFieldParser(rdr)
            parser.TextFieldType = FieldType.Delimited
            parser.Delimiters = New String() {","}
            parser.HasFieldsEnclosedInQuotes = True
            Dim fields() As String = parser.ReadFields()

            For i As Integer = 0 To fields.Length - 1
                Console.WriteLine("Field {0}: {1}", i, fields(i))
            Next
        End Using
    End Using

Output:

Field 0: 1
Field 1: 2,3,4
Field 2: 5
like image 6
Chris Dunaway Avatar answered Sep 20 '22 04:09

Chris Dunaway