Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use regular expression in iPhone app to separate string by , (comma)

I have to read .csv file which has three columns. While parsing the .csv file, I get the string in this format Christopher Bass,\"Cry the Beloved Country Final Essay\",[email protected]. I want to store the values of three columns in an Array, so I used componentSeparatedByString:@"," method! It is successfully returning me the array with three components:

  1. Christopher Bass
  2. Cry the Beloved Country Final Essay
  3. [email protected]

but when there is already a comma in the column value, like this Christopher Bass,\"Cry, the Beloved Country Final Essay\",[email protected] it separates the string in four components because there is a ,(comma) after the Cry:

  1. Christopher Bass
  2. Cry
  3. the Beloved Country Final Essay
  4. [email protected]

so, How can I handle this by using regular expression. I have "RegexKitLite" classes but which regular expression should I use. Please help!

Thanks-

like image 815
Developer Avatar asked Jan 31 '12 16:01

Developer


2 Answers

Any regular expression would probably turn out with the same problem, what you need is to sanitize your entries or strings, either by escaping your commas or by highlighting strings this way: "My string". Otherwise you will have the same problem. Good luck.

For your example you would probably need to do something like:

\"Christopher Bass\",\"Cry\, the Beloved Country Final Essay\",\"[email protected]\"

That way you could use a regexp or even the same method from the NSString class.

Not related at all, but the importance of sanitizing strings: http://xkcd.com/327/ hehehe.

like image 169
El Developer Avatar answered Oct 20 '22 01:10

El Developer


How about this:

componentsSeparatedByRegex:@",\\\"|\\\","

This should split your string whereever " and , appear together in either order, resulting in a three-member array. This of course assumes that the second element in the string is always enclosed in parentheses, and the characters " and , never appear consecutively within the three components.

If either of these assumptions is incorrect, other methods to identify string components may be used, but it should be made clear that no generic solution exists. If the three component strings can contain " and , anywhere, not even a limited solution is possible in such cases:

Doe, John,\"\"Why Unescaped Strings Suck\", And Other Development Horror Stories\",Doe, John <[email protected]>

Hopefully there is nothing like the above in your CSV data. If there is, the data is basically unusable, and you should look into a better CSV exporter.

like image 39
Feysal Avatar answered Oct 20 '22 02:10

Feysal