I'm trying to parse csv file with VB.NET.
csv files contains value like 0,"1,2,3",4 which splits in 5 instead of 3. There are many examples with other languages in Stockoverflow but I can't implement it in VB.NET. Here is my code so far but it doesn't work...
Dim t As String() = Regex.Split(str(i), ",(?=([^\""]*\""[^\""]*\"")*[^\""]*$)")
Assuming your csv is well-formed (ie no "
besides those used to delimit string fields, or besides ones escaped like \"
), you can split on a comma that's followed by an even number of non-escaped "-marks. (If you're inside a set of "" there's only an odd number left in the line).
Your regex you've tried looks like you're almost there.
The following looks for a comma followed by an even number of any sort of quote marks:
,(?=([^"]*"[^"]*")*[^"]*$)
To modify it to look for an even number of non-escaped quote marks (assuming quote marks are escaped with backslash like \"
), I replace each [^"]
with ([^"\\]|\\.)
. This means "match a character that isn't a " and isn't a blackslash, OR match a backslash and the character immediately following it".
,(?=(([^"\\]|\\.)*"([^"\\]|\\.)*")*([^"\\]|\\.)*$)
See it in action here. (The reason the backslash is doubled is I want to match a literal backslash).
Now to get it into vb.net you just need to double all your quote marks:
splitRegex = ",(?=(([^""\\]|\\.)*""([^""\\]|\\.)*"")*([^""\\]|\\.)*$)"
Instead of a regular expression, try using the TextFieldParser class for reading .csv files. It handles your situation exactly.
TextFieldParserClass
Especially look at the HasFieldsEnclosedInQuotes property.
Example:
Note: I used a string instead of a file, but the result would be the same.
Dim theString As String = "1,""2,3,4"",5"
Using rdr As New StringReader(theString)
Using parser As New TextFieldParser(rdr)
parser.TextFieldType = FieldType.Delimited
parser.Delimiters = New String() {","}
parser.HasFieldsEnclosedInQuotes = True
Dim fields() As String = parser.ReadFields()
For i As Integer = 0 To fields.Length - 1
Console.WriteLine("Field {0}: {1}", i, fields(i))
Next
End Using
End Using
Output:
Field 0: 1
Field 1: 2,3,4
Field 2: 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With