Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I parse an Excel file that will give me data exactly as it appears visually?

I'm on Rails 5 (Ruby 2.4). I want to read an .xls doc and I would like to get the data into CSV format, just as it appears in the Excel file. Someone recommended I use Roo, and so I have

book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
arr_of_arrs = CSV.parse(text)

However what is getting returned is not the same as what I see in the spreadsheet. For isntance, a cell in the spreadsheet has

16:45.81

and when I get the CSV data from above, what is returned is

"0.011641319444444444"

How do I parse the Excel doc and get exactly what I see? I don't care if I use Roo to parse or not, just as long as I can get CSV data that is a representation of what I see rather than some weird internal representation. For reference the file type I was parsing givies this when I run "file name_of_file.xls" ...

Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Dwight Schroot, Last Saved By: Dwight Schroot, Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Sep 21 17:05:21 2010, Last Saved Time/Date: Wed Oct 13 16:52:14 2010, Security: 0
like image 240
Dave Avatar asked Mar 28 '17 19:03

Dave


People also ask

How do you parse an Excel spreadsheet?

Click the “Data” tab in the ribbon, then look in the "Data Tools" group and click "Text to Columns." The "Convert Text to Columns Wizard" will appear. In step 1 of the wizard, choose “Delimited” > Click [Next]. A delimiter is the symbol or space which separates the data you wish to split.

What does parse data mean in Excel?

Parsing data means you break it down into separate components. For example, you split a column of full names into one column for first names and one for surnames. There is more than one way to extract data from Excel and send it elsewhere.


1 Answers

You need to save the custom formula in a text format on the .xls side. If your opening the .xls file from the internet this won't work but this will fix your problem if you can manipulate the file. You can do this using the function =TEXT(A2, "mm:ss.0") A2 is just the cell I'm using as an example.

enter image description here

book = ::Roo::Spreadsheet.open(file_location)
puts book.cell('B', 2) 
=> '16.45.8' 

If manipulating the file is not an option you could just pass a custom converter to CSV.new() and convert the decimal time back to the correct format you need.

require 'roo-xls'
require 'csv'

CSV::Converters[:time_parser] = lambda do |field, info| 
  case info[:header].strip
  when "time" then  begin 
                      # 0.011641319444444444 * 24 hours * 3600 seconds = 1005.81 
                      parse_time =  field.to_f * 24 * 3600
                      # 1005.81.divmod(60) = [16, 45.809999999999999945]
                      mm, ss = parse_time.divmod(60)
                      # returns "16:45.81"
                      time = "#{mm}:#{ss.round(2)}"  
                      time 
                    rescue
                      field 
                    end
  else 
    field  
  end
end

book = ::Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
csv = CSV.new(sheet.to_csv, headers: true, converters: [:time_parser]).map {|row| row.to_hash}
puts csv 
=> {"time "=>"16:45.81"}
   {"time "=>"12:46.0"}
like image 112
David Gross Avatar answered Sep 28 '22 21:09

David Gross