I'm using Rails 5. I want to parse an .xls (not to be confused with .xlsx doc) using the code below
book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
csv = CSV.parse(text)
arr_of_arrs = csv
text_content = ""
arr_of_arrs.each do |arr|
arr.map!{|v| v && v.to_f < 1 && v.to_f > 0 ? TimeFormattingHelper.time_as_str(v.to_f * 24 * 3600 * 1000) : v}
text_content = "#{text_content}\n#{arr.join("\t")}"
end
Here is the method I reference above
def time_as_str(time_in_ms)
regex = /^(0*:?)*0*/
Time.at(time_in_ms.to_f/1000).utc.strftime("%H:%M:%S.%1N").sub!(regex, '')
end
One area I'm having trouble is that a cell that appears in my .xls doc as
24:08:00
is processed as
1904-01-02T00:08:00+00:00
with the code above. How do I parse the value I see on the screen? That is, how do I convert the date value into a time value?
As an example from another Excel doc, the cell that appears as
24:02:00
is getting parsed by my code above as
1899-12-31T00:02:00+00:00
You can also press CTRL+1 to open the Format Cells dialog box. In the Category list, click Date or Time. In the Type list, click the date or time format that you want to use.
It seems your .xls is in the 1904 date system, and Roo is not able to distinguish between what is a Duration and what is a DateTime, so you'll need to subtract the base date 1904-01-01 to the cell value. Weirdly enough, in case of the 1900 date system, you need to subtract the base date 1899-12-30, due to a bug in Lotus 1-2-3 that Microsoft replicated in Excel for compatibility.
Here is a method that converts the DateTime read from the spreadsheet into the duration according to the base date:
def duration_as_str(datetime, base_date)
total_seconds = DateTime.parse(datetime).to_i - base_date.to_i
hours = total_seconds / (60 * 60)
minutes = (total_seconds / 60) % 60
seconds = total_seconds % 60
"%d:%02d:%02d" % [hours, minutes, seconds]
end
Let's test it:
irb(main):019:0> duration_as_str("1904-01-02T00:08:00+00:00", DateTime.new(1904, 1, 1))
=> "24:08:00"
irb(main):020:0> duration_as_str("1899-12-31T00:02:00+00:00", DateTime.new(1899, 12, 30))
=> "24:02:00"
You can use book.workbook.date_base.year
to determine the spreadsheet's date system, and then just add another map
inside your each
loop:
book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
csv = CSV.parse(text)
base_date = book.workbook.date_base.year == 1904 ? DateTime.new(1904, 1, 1) : DateTime.new(1899, 12, 30)
arr_of_arrs = csv
text_content = ""
arr_of_arrs.each do |arr|
arr.map!{|v| v && v.to_f < 1 && v.to_f > 0 ? TimeFormattingHelper.time_as_str(v.to_f * 24 * 3600 * 1000) : v}
arr.map!{|v| v =~ /^(1904|1899)-/ ? duration_as_str(v, base_date) : v}
text_content = "#{text_content}\n#{arr.join("\t")}"
end
You could use something like the below and write a custom parser for that string.
duration = 0
"24:08:01".split(":").each_with_index do |value, i|
if i == 0
duration += value.to_i.hours
elsif i == 1
duration += value.to_i.minutes
else
duration += value.to_i.seconds
end
end
duration.value => 86881 (duration in seconds)
This parser will assume a format of hours:minutes:seconds
and return an instance of ActiveSupport::Duration
. Then, duration.value
will give you the number of seconds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With