Are there any gems able to parse XLS and XLSX files? I've found Spreadsheet and ParseExcel, but they both don't understand XLSX format.
The File Format DifferenceData is arranged in an XLS file as binary streams in the form of a compound file as described in [MS-XLS]. In contrast, an XLSX file is based on Office Open XML format that stores data in compressed XML files in ZIP format.
Click the “Data” tab in the ribbon, then look in the "Data Tools" group and click "Text to Columns." The "Convert Text to Columns Wizard" will appear. In step 1 of the wizard, choose “Delimited” > Click [Next]. A delimiter is the symbol or space which separates the data you wish to split.
Click on the Office button, then Convert. You can also try opening the . xls file in Excel > then use the Save As command to save as an "Excel Workbook". Once you select that file type, the file extension will be xlsx and you can delete the old xls version.
I recently needed to parse some Excel files with Ruby. The abundance of libraries and options turned out to be confusing, so I wrote a blog post about it.
Here is a table of different Ruby libraries and what they support:
If you care about performance, here is how the xlsx
libraries compare:
I have sample code to read xlsx files with each supported library here
Here are some examples for reading xlsx
files with some different libraries:
rubyXL
require 'rubyXL' workbook = RubyXL::Parser.parse './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.worksheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet.sheet_name}" num_rows = 0 worksheet.each do |row| row_cells = row.cells.map{ |cell| cell.value } num_rows += 1 end puts "Read #{num_rows} rows" end
roo
require 'roo' workbook = Roo::Spreadsheet.open './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet}" num_rows = 0 workbook.sheet(worksheet).each_row_streaming do |row| row_cells = row.map { |cell| cell.value } num_rows += 1 end puts "Read #{num_rows} rows" end
creek
require 'creek' workbook = Creek::Book.new './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet.name}" num_rows = 0 worksheet.rows.each do |row| row_cells = row.values num_rows += 1 end puts "Read #{num_rows} rows" end
simple_xlsx_reader
require 'simple_xlsx_reader' workbook = SimpleXlsxReader.open './sample_excel_files/xlsx_500000_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet.name}" num_rows = 0 worksheet.rows.each do |row| row_cells = row num_rows += 1 end puts "Read #{num_rows} rows" end
Here is an example of reading a legacy xls
file using the spreadsheet
library:
spreadsheet
require 'spreadsheet' # Note: spreadsheet only supports .xls files (not .xlsx) workbook = Spreadsheet.open './sample_excel_files/xls_500_rows.xls' worksheets = workbook.worksheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet.name}" num_rows = 0 worksheet.rows.each do |row| row_cells = row.to_a.map{ |v| v.methods.include?(:value) ? v.value : v } num_rows += 1 end puts "Read #{num_rows} rows" end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With