Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby CSV.open need to remove quotes and null characters

Tags:

ruby

csv

I am retrieving a large hash of results from a database query and writing them to a csv file. The code block below takes the results and creates the CSV. With the quote_char: option it will replace the quotes with NULL characters which I need to properly create the tab-delimited file.

However, the NULL characters are getting converted into "" when they are loaded into their destination so I would like to remove those. If I leave out quote_char: every field is double quoted which causes the same result.

How can I remove the NULL characters?

begin
    CSV.open("#{file_path}"'file.tab', "wb", Options = {col_sep: "\t", quote_char: "\0"}) do |csv|
        csv << ["Key","channel"]           
        series_1_results.each_hash do |series_1|
         csv << ["#{series_1['key']}","#{series_1['channel']}"]
        end
    end
end
like image 522
analyticsPierce Avatar asked May 10 '13 07:05

analyticsPierce


People also ask

How do you remove quotes from CSV?

Click on Replace (or hit CTRL + h). In the “Find what” field of the Dialog box type in a double quote. In the “Replace with” do nothing (or if you want some other character, put it here. Then either click the “Replace All” button to do everything at once, or click the “Replace” and go through one by one.

Why does my csv file have quotation marks?

Quotation marks are used as text qualifiers Quotation marks appear in CSV files as text qualifiers. This means, they function to wrap together text that should be kept as one value, versus what are distinct values that should be separated out.

Do you need quotes in CSV?

Fields that contain commas must begin and end with double quotes. Fields that contain line breaks must begin and end with double quotes (not all programs support values with line breaks). All other fields do not require double quotes.


2 Answers

As it is stated in the csv documentation you have to the set quote_char to some character, and this character will always be used to quote empty fields.

It seems the only solution in this case is to remove used quote_chars from the created csv file. You can do it like this:

quotedFile = File.read("#{file_path}"'file.tab')
unquotedFile = quotedFile.gsub("\0", "")
File.open("#{file_path}"'unquoted_file.tab',"w") { |file| file.puts replace }

I assume here that NULL's are the only escaped fields. If that's not the case use default quote_char: '"' and gsub(',"",', '') which should handle almost all possible cases of fields containing special characters.

But as you note that the results of your query are large it might be more practical to prepare the csv file on your own and avoid processing the outputs twice. You could simply write:

File.open("#{file_path}"'unquoted_file.tab',"w") do |file|
    csv.puts ["Key","channel"]     
    series_1_results.each_hash do |series_1|
        csv.puts ["#{series_1['key']},#{series_1['channel']}"]
    end
end

Once more, you might need to handle fields with special characters.

like image 141
Legat Avatar answered Sep 30 '22 00:09

Legat


From the Ruby CSV docs, setting force_quotes: false in the options seems to work.

CSV.open("#{file_path}"'file.tab', "wb", { col_sep: "\t", force_quotes: false }) do |csv|

The above does the trick. I'd suggest against setting quote_char to \0 since that doesn't work as expected.

There is one thing to note though. If the field is an empty string "" - it will force the quote_char to be printed into the CSV. But strangely a nil value does not. I'd suggest that if at all you're expecting empty strings in the data, you somehow convert them into nil when writing to the CSV (maybe using the ActiveSupport presence method or anything similar).

like image 40
Subhas Avatar answered Sep 29 '22 23:09

Subhas