I am retrieving a large hash of results from a database query and writing them to a csv file. The code block below takes the results and creates the CSV. With the quote_char:
option it will replace the quotes with NULL characters which I need to properly create the tab-delimited file.
However, the NULL characters are getting converted into "" when they are loaded into their destination so I would like to remove those. If I leave out quote_char:
every field is double quoted which causes the same result.
How can I remove the NULL characters?
begin
CSV.open("#{file_path}"'file.tab', "wb", Options = {col_sep: "\t", quote_char: "\0"}) do |csv|
csv << ["Key","channel"]
series_1_results.each_hash do |series_1|
csv << ["#{series_1['key']}","#{series_1['channel']}"]
end
end
end
Click on Replace (or hit CTRL + h). In the “Find what” field of the Dialog box type in a double quote. In the “Replace with” do nothing (or if you want some other character, put it here. Then either click the “Replace All” button to do everything at once, or click the “Replace” and go through one by one.
Quotation marks are used as text qualifiers Quotation marks appear in CSV files as text qualifiers. This means, they function to wrap together text that should be kept as one value, versus what are distinct values that should be separated out.
Fields that contain commas must begin and end with double quotes. Fields that contain line breaks must begin and end with double quotes (not all programs support values with line breaks). All other fields do not require double quotes.
As it is stated in the csv documentation you have to the set quote_char
to some character, and this character will always be used to quote empty fields.
It seems the only solution in this case is to remove used quote_chars
from the created csv file. You can do it like this:
quotedFile = File.read("#{file_path}"'file.tab')
unquotedFile = quotedFile.gsub("\0", "")
File.open("#{file_path}"'unquoted_file.tab',"w") { |file| file.puts replace }
I assume here that NULL's are the only escaped fields. If that's not the case use default quote_char: '"'
and gsub(',"",', '')
which should handle almost all possible cases of fields containing special characters.
But as you note that the results of your query are large it might be more practical to prepare the csv file on your own and avoid processing the outputs twice. You could simply write:
File.open("#{file_path}"'unquoted_file.tab',"w") do |file|
csv.puts ["Key","channel"]
series_1_results.each_hash do |series_1|
csv.puts ["#{series_1['key']},#{series_1['channel']}"]
end
end
Once more, you might need to handle fields with special characters.
From the Ruby CSV docs, setting force_quotes: false
in the options seems to work.
CSV.open("#{file_path}"'file.tab', "wb", { col_sep: "\t", force_quotes: false }) do |csv|
The above does the trick. I'd suggest against setting quote_char
to \0
since that doesn't work as expected.
There is one thing to note though. If the field is an empty string ""
- it will force the quote_char
to be printed into the CSV. But strangely a nil
value does not. I'd suggest that if at all you're expecting empty strings in the data, you somehow convert them into nil
when writing to the CSV (maybe using the ActiveSupport presence
method or anything similar).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With