Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV Parsing Strings containing double quotes and commas

Apologies if this question is a duplicate.

I'm trying to create a CSV file with, as an example, a Title, Name and Comments column. Each row of the CSV has it's values for each column being read from the Database into the relevant variable, and these variables, split by commas, are then added to a variable 'newline' which was then appended to the csv as follows:

    string title = reader[0]
    string name = reader[1]
    string comments = reader[2]
    var csv = new StringBuilder();
    var headerLine = "Title,Name,Comments"
    csv.AppendLine(headerLine);

    var newline = title + "," + name + "," + comments;
    csv.AppendLine(newline);

Originally I took the above approach, but quickly ran into the problem where, if any of the fields read in from the database had commas in them, that field would be split in the middle, e.g. if comments = "Comment, goes, here" would be spread across 3 columns in the out CSV instead of 1.

To try and overcome this problem, I changed my code so that all fields would be wrapped in double quotes, meaning commas within a field would no longer split the CSV, as follows:

    var newline = "\"" + title + "\",\"" + name + "\",\"" + comments + "\""

Now, with this in place, I am getting a problem where if any of the values read in from the Database contain a double quote, this messes up everything, since a double quote from the database is displayed as \" when the field is converted into a string, which is what I'm already using to wrap the start and end of each field in double quotes to prevent commas splitting fields containing commas.

like image 712
axel Avatar asked Feb 08 '23 02:02

axel


2 Answers

You need to escape the double quotes. What you escape them with depends on whatever you plan to read them with. It's possible that the escape sequence should be a backslash followed by a double quote, but if using Excel it will be two double quotes in a row. That can be done using string.Replace.

var newline = "\"" + title.Replace("\"", "\"\"") + 
              "\",\"" + name.Replace("\"", "\"\"") + 
              "\",\"" + comments.Replace("\"", "\"\"") + "\"";

Or using string.Format

var newline = string.Format(
    "\"{0}\",\"{1}\",\"{2}\"",
    title.Replace("\"", "\"\""),
    name.Replace("\"", "\"\""),
    comments.Replace("\"", "\"\""));
like image 77
juharr Avatar answered Feb 10 '23 16:02

juharr


The rule is simple: if item contains either , (comma) or " (quotation mark) it should be put into quotation marks; each " within the item should be doubled:

  abcde    -> abcde        // or "abcde", quotation is not mandatory, however
  abc,de   -> "abc,de"
  abc"de   -> "abc""de"
  abc","de -> "abc"",""de"

possible implementation:

    public static String EncodeCsvItem(String value) {
      if (String.IsNullOrEmpty(value))
        return "";

      StringBuilder Sb = new StringBuilder(value.Length + 2);

      bool wrap = false;

      foreach (var ch in value) {
        if (ch == '"') {
          Sb.Append('"');
          wrap = true;
        }
        else if (ch == ',')
          wrap = true;

        Sb.Append(ch);
      }

      if (wrap) {
        Sb.Insert(0, '"');
        Sb.Append('"');
      }

      return Sb.ToString();
    }

....

   var newline = String.Join(",",
     EncodeCsvItem(reader[0]),
     EncodeCsvItem(reader[1]),
     EncodeCsvItem(reader[2]) 
   );

   csv.AppendLine(newline);
like image 36
Dmitry Bychenko Avatar answered Feb 10 '23 16:02

Dmitry Bychenko