Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing XLS and XLSX (MS Excel) files with Ruby?

Tags:

ruby

excel

Are there any gems able to parse XLS and XLSX files? I've found Spreadsheet and ParseExcel, but they both don't understand XLSX format.

like image 423
Daniel Avatar asked Jul 23 '10 18:07

Daniel


People also ask

What is the difference between an XLS and XLSX file?

The File Format DifferenceData is arranged in an XLS file as binary streams in the form of a compound file as described in [MS-XLS]. In contrast, an XLSX file is based on Office Open XML format that stores data in compressed XML files in ZIP format.

How do you parse an Excel spreadsheet?

Click the “Data” tab in the ribbon, then look in the "Data Tools" group and click "Text to Columns." The "Convert Text to Columns Wizard" will appear. In step 1 of the wizard, choose “Delimited” > Click [Next]. A delimiter is the symbol or space which separates the data you wish to split.

How do I convert XLS to XLSX?

Click on the Office button, then Convert. You can also try opening the . xls file in Excel > then use the Save As command to save as an "Excel Workbook". Once you select that file type, the file extension will be xlsx and you can delete the old xls version.


1 Answers

I recently needed to parse some Excel files with Ruby. The abundance of libraries and options turned out to be confusing, so I wrote a blog post about it.

Here is a table of different Ruby libraries and what they support:

enter image description here

If you care about performance, here is how the xlsx libraries compare: enter image description here

I have sample code to read xlsx files with each supported library here

Here are some examples for reading xlsx files with some different libraries:

rubyXL

require 'rubyXL'  workbook = RubyXL::Parser.parse './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.worksheets puts "Found #{worksheets.count} worksheets"  worksheets.each do |worksheet|   puts "Reading: #{worksheet.sheet_name}"   num_rows = 0   worksheet.each do |row|     row_cells = row.cells.map{ |cell| cell.value }     num_rows += 1   end   puts "Read #{num_rows} rows" end 

roo

require 'roo'  workbook = Roo::Spreadsheet.open './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets"  worksheets.each do |worksheet|   puts "Reading: #{worksheet}"   num_rows = 0   workbook.sheet(worksheet).each_row_streaming do |row|     row_cells = row.map { |cell| cell.value }     num_rows += 1   end   puts "Read #{num_rows} rows"  end 

creek

require 'creek'  workbook = Creek::Book.new './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets"  worksheets.each do |worksheet|   puts "Reading: #{worksheet.name}"   num_rows = 0   worksheet.rows.each do |row|     row_cells = row.values     num_rows += 1   end   puts "Read #{num_rows} rows" end 

simple_xlsx_reader

require 'simple_xlsx_reader'  workbook = SimpleXlsxReader.open './sample_excel_files/xlsx_500000_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets"  worksheets.each do |worksheet|   puts "Reading: #{worksheet.name}"   num_rows = 0   worksheet.rows.each do |row|     row_cells = row     num_rows += 1   end   puts "Read #{num_rows} rows" end 

Here is an example of reading a legacy xls file using the spreadsheet library:

spreadsheet

require 'spreadsheet'  # Note: spreadsheet only supports .xls files (not .xlsx) workbook = Spreadsheet.open './sample_excel_files/xls_500_rows.xls' worksheets = workbook.worksheets puts "Found #{worksheets.count} worksheets"  worksheets.each do |worksheet|   puts "Reading: #{worksheet.name}"   num_rows = 0   worksheet.rows.each do |row|     row_cells = row.to_a.map{ |v| v.methods.include?(:value) ? v.value : v }     num_rows += 1   end   puts "Read #{num_rows} rows" end 
like image 124
mattnedrich Avatar answered Oct 15 '22 21:10

mattnedrich