Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To convert Excel into CSV efficiently in ruby

I used the spreadsheet gem to do this. It works but it can be very slow at times .I even tried the Roo gem, but that didn't improve the performance. Is there a better way to do this job? The weird thing is that some worksheets in the same excel work faster and some worksheets work very slowly, even taking up to 1 hour.

Can we use open office to open each worksheet(tab) in a single excel and convert them to csv much faster? If yes, how would I do it in ruby?

Or is there an even better solution?

Just adding a small example I tried with Roo gem

xls = Roo::Excel.new(source_excel_file)
xls.each_with_pagename do |name, sheet|
  # p sheet.to_csv(File.join(dest_csv_dir,name + ".csv"))
  #sheet.parse(:clean => true)#.to_csv(File.join(dest_csv_dir,name + ".csv"))
  puts name
  puts sheet.parse(:clean => true)
end
like image 834
Arunachalam Avatar asked May 20 '14 16:05

Arunachalam


1 Answers

Cowardly Preface: I am SUPER new to ruby and know almost nothing of rails, but I have tangled with Excel before. I created a dummy workbook on my local machine with 5 sheets, each containing 10 columns and 1000 rows of randomly-generated numbers. I converted each sheet into its own CSV with this:

require 'win32ole'
require 'csv'

# configure a workbook, turn off excel alarms
xl = WIN32OLE.new('excel.application')
book = xl.workbooks.open('C:\stack\my_workbook.xlsx')
xl.displayalerts = false

# loop through all worksheets in the excel file
book.worksheets.each do |sheet|
  last_row = sheet.cells.find(what: '*', searchorder: 1, searchdirection: 2).row
  last_col = sheet.cells.find(what: '*', searchorder: 2, searchdirection: 2).column
  export = File.new('C:\\stack\\' + sheet.name + '.csv', 'w+')
  csv_row = []

  # loop through each column in each row and write to CSV
  (1..last_row).each do |xlrow|
    (1..last_col).each do |xlcol|
      csv_row << sheet.cells(xlrow, xlcol).value
    end
    export << CSV.generate_line(csv_row)
    csv_row = []
  end
end

# clean up
book.close(savechanges: 'false')
xl.displayalerts = true
xl.quit

An eyeball benchmark for this script was ~30 seconds, with each attempt coming in a few seconds above or below that.

like image 111
Dan Wagner Avatar answered Oct 21 '22 19:10

Dan Wagner