Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby won't parse CSV when cell/field contains leading double quote

Tags:

ruby

csv

How would I go about parsing a CSV file when one of the columns contains a double quote " character? Im getting the "Missing or stray quote in line xxx" error because there is a trailing double quote in it. The exact error is "Missing or stray quote in line 58 (CSV::MalformedCSVError)". The data is coming from an application that parses another device's (Firewall) config, and the " has been added as a comment to the configuration of this device by the admin, and is therefore beyond my control.

Example Input Data (can't provide the files, they are sensitive in nature):

"Table 1 Firewall Policy from INT to EXT administrative service rules on TestFirewall","1","Yes","Allow","[Group] GreenServer","[Host] Any","[Group] FTP","No",""Access"^M

As you can see, the comment in the last column is ""Access". The script I have so far appears to work perfectly well if there is just a double quote in the last column.

Minimum code required to replicate:

#!/usr/bin/env ruby
require 'csv'
require 'pp'
nipperfiles = Dir.glob(ARGV[0] + '/*.csv')

def allcsv(nipperfiles)
  filearray = []
  nipperfiles.each do |csv|
  filearray << csv
  end

  filearray
end

def devicetype(filelist)
  filelist.each do |f|
  CSV.foreach(f, :headers => true, :force_quotes => true, :encoding => Encoding::UTF_8) do |row|
    if row["Table"] =~ /audit device list/ && row["OS"] =~ /FortiOS/
      return "Fortigate"
    end
    end
  end
end

filelist = allcsv(nipperfiles)
device = devicetype(filelist)

Ideally the working code would just ignore the extra quote or replace it or any other potentially problematic characters. It is probably worth noting that given the original Firewall config is configured by a person, that person could put the extra quote in just about any cell/field.

like image 378
hatlord Avatar asked Feb 07 '23 19:02

hatlord


2 Answers

Here is a trick that may help. Use :quote_char => "'" (assuming values in columns in CSV do not have single quote character), and this will include double quotes in the read values - which you can get rid via code:

Example:

CSV.foreach(f, :force_quotes => true, :encoding => Encoding::UTF_8,
               :quote_char => "'") do |row|
   puts row[0]
   #=> "Table 1 Firewall ... administrative service rules on TestFirewall"
   puts row[0][1..-2]
   #=> Table 1 Firewall ... administrative service rules on TestFirewall
end

FYI: You could use any character that is least likely to appear in the CSV text as :quote_char and above solution will still work


If above does not work, then, you are better off processing each line as String and using split on it rather than using CSV class.

File.open("/path/to/file") do |f|
  f.each_line do |for|
    columns = row.split(",")
  end
end
like image 109
Wand Maker Avatar answered May 20 '23 07:05

Wand Maker


You can rescue from CSV::MalformedCSVError and create separate handlers for lines with such problems, but this means you'll have to parse every line separately and you lose column names from the header line.

require 'csv'

File.open('csv.csv').each_line do |input_row|
  begin
    CSV.parse(input_row) do |row|
      puts row.inspect
    end
  rescue CSV::MalformedCSVError => error
    if input_row.include?('""')
      input_row.gsub!('""', '"')
      retry
    else
      raise error
    end
  end
end

I'm a bit surprised there isn't an option like :on_malformed_csv => lambda ....

like image 26
Kimmo Lehto Avatar answered May 20 '23 06:05

Kimmo Lehto