Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I force one field in Ruby's CSV output to be wrapped with double-quotes?

Tags:

I'm generating some CSV output using Ruby's built-in CSV. Everything works fine, but the customer wants the name field in the output to have wrapping double-quotes so the output looks like the input file. For instance, the input looks something like this:

1,1.1.1.1,"Firstname Lastname",more,fields 2,2.2.2.2,"Firstname Lastname, Jr.",more,fields 

CSV's output, which is correct, looks like:

1,1.1.1.1,Firstname Lastname,more,fields 2,2.2.2.2,"Firstname Lastname, Jr.",more,fields 

I know CSV is doing the right thing by not double-quoting the third field just because it has embedded blanks, and wrapping the field with double-quotes when it has the embedded comma. What I'd like to do, to help the customer feel warm and fuzzy, is tell CSV to always double-quote the third field.

I tried wrapping the field in double-quotes in my to_a method, which creates a "Firstname Lastname" field being passed to CSV, but CSV laughed at my puny-human attempt and output """Firstname Lastname""". That is the correct thing to do because it's escaping the double-quotes, so that didn't work.

Then I tried setting CSV's :force_quotes => true in the open method, which output double-quotes wrapping all fields as expected, but the customer didn't like that, which I expected also. So, that didn't work either.

I've looked through the Table and Row docs and nothing appeared to give me access to the "generate a String field" method, or a way to set a "for field n always use quoting" flag.

I'm about to dive into the source to see if there's some super-secret tweaks, or if there's a way to monkey-patch CSV and bend it to do my will, but wondered if anyone had some special knowledge or had run into this before.

And, yes, I know I could roll my own CSV output, but I prefer to not reinvent well-tested wheels. And, I'm also aware of FasterCSV; That's now part of Ruby 1.9.2, which I'm using, so explicitly using FasterCSV buys me nothing special. Also, I'm not using Rails and have no intention of rewriting it in Rails, so unless you have a cute way of implementing it using a small subset of Rails, don't bother. I'll downvote any recommendations to use any of those ways just because you didn't bother to read this far.

like image 792
the Tin Man Avatar asked Jan 31 '11 19:01

the Tin Man


2 Answers

Well, there's a way to do it but it wasn't as clean as I'd hoped the CSV code could allow.

I had to subclass CSV, then override the CSV::Row.<<= method and add another method forced_quote_fields= to make it possible to define the fields I want to force-quoting on, plus pull two lambdas from other methods. At least it works for what I want:

require 'csv'  class MyCSV < CSV     def <<(row)       # make sure headers have been assigned       if header_row? and [Array, String].include? @use_headers.class         parse_headers  # won't read data for Array or String         self << @headers if @write_headers       end        # handle CSV::Row objects and Hashes       row = case row         when self.class::Row then row.fields         when Hash            then @headers.map { |header| row[header] }         else                      row       end        @headers = row if header_row?       @lineno  += 1        @do_quote ||= lambda do |field|         field         = String(field)         encoded_quote = @quote_char.encode(field.encoding)         encoded_quote                                +         field.gsub(encoded_quote, encoded_quote * 2) +         encoded_quote       end        @quotable_chars      ||= encode_str("\r\n", @col_sep, @quote_char)       @forced_quote_fields ||= []        @my_quote_lambda ||= lambda do |field, index|         if field.nil?  # represent +nil+ fields as empty unquoted fields           ""         else           field = String(field)  # Stringify fields           # represent empty fields as empty quoted fields           if (             field.empty?                          or             field.count(@quotable_chars).nonzero? or             @forced_quote_fields.include?(index)           )             @do_quote.call(field)           else             field  # unquoted field           end         end       end        output = row.map.with_index(&@my_quote_lambda).join(@col_sep) + @row_sep  # quote and separate       if (         @io.is_a?(StringIO)             and         output.encoding != raw_encoding and         (compatible_encoding = Encoding.compatible?(@io.string, output))       )         @io = StringIO.new(@io.string.force_encoding(compatible_encoding))         @io.seek(0, IO::SEEK_END)       end       @io << output        self  # for chaining     end     alias_method :add_row, :<<     alias_method :puts,    :<<      def forced_quote_fields=(indexes=[])       @forced_quote_fields = indexes     end end 

That's the code. Calling it:

data = [    %w[1 2 3],    [ 2, 'two too',  3 ],    [ 3, 'two, too', 3 ]  ]  quote_fields = [1]  puts "Ruby version:   #{ RUBY_VERSION }" puts "Quoting fields: #{ quote_fields.join(', ') }", "\n"  csv = MyCSV.generate do |_csv|   _csv.forced_quote_fields = quote_fields   data.each do |d|      _csv << d   end end  puts csv 

results in:

# >> Ruby version:   1.9.2 # >> Quoting fields: 1 # >>  # >> 1,"2",3 # >> 2,"two too",3 # >> 3,"two, too",3 
like image 80
the Tin Man Avatar answered Oct 06 '22 08:10

the Tin Man


This post is old, but I can't believe no one thought of this.

Why not do:

csv = CSV.generate :quote_char => "\0" do |csv| 

where \0 is a null character, then just add quotes to each field where they are needed:

csv << [product.upc, "\"" + product.name + "\"" # ... 

Then at the end you can do a

csv.gsub!(/\0/, '') 
like image 40
Tom Grushka Avatar answered Oct 06 '22 08:10

Tom Grushka