I have a VBA application that runs every day. It checks a folder where CSVs are downloaded automatically, and adds their contents to a database. When parsing them, I realized that certain values had commas as a part of their name. These values were contained in string literals.
So I'm trying to figure out how to parse this CSV and ignore commas that are contained in string literals. For example...
1,2,3,"This should,be one part",5,6,7 Should return
1
2
3
"This should,be one part"
5
6
7
I have been using VBA's split() function, because I don't wanna reinvent the wheel, but if I have to I guess I'll do something else.
Any suggestions would be appreciated.
The first way to solve this problem is to look at the structure of the line from the csv file (int,int,"String literal, will have at most one comma", etc). A naive solution would be (Assuming that the line don't have any semicolons)
Function splitLine1(line As String) As String()
Dim temp() As String
'Splits the line in three. The string delimited by " will be at temp(1)
temp = Split(line, Chr(34)) 'chr(34) = "
'Replaces the commas in the numeric fields by semicolons
temp(0) = Replace(temp(0), ",", ";")
temp(2) = Replace(temp(2), ",", ";")
'Joins the temp array with quotes and then splits the result using the semicolons
splitLine1 = Split(Join(temp, Chr(34)), ";")
End Function
This function only solves this particular problem. Another way to do the job is using the regular expression object from VBScript.
Function splitLine2(line As String) As String()
Dim regex As Object
Set regex = CreateObject("vbscript.regexp")
regex.IgnoreCase = True
regex.Global = True
'This pattern matches only commas outside quotes
'Pattern = ",(?=([^"]*"[^"]*")*(?![^"]*"))"
regex.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
'regex.replaces will replace the commas outside quotes with semicolons and then the
'Split function will split the result based on the semicollons
splitLine2 = Split(regex.Replace(line, ";"), ";")
End Function
This method seems much more cryptic, but does not deppends on the structure of the line
You can read more about regular expressions patterns in VBScript Here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With