Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load old Microsoft Office XML file (Excel) using Java

I'm not able to load an Excel file in the older Office XML format (think Office 2002 or 2003 version) into Java. I tried JXL and Apache's POI (version 3.7). POI doesn't work since it appears to want the newer Office .xlsx format.

Here's an example of the older Office XML format.

One can generate a similar XML file from MS Excel 2010 by saving the workbook as the format "XML Spreadsheet 2003"?

Are there any open-source Java libraries that will load the XMLSS format? Otherwise I have no choice but to write a custom parser: read the XML file then interpret the cell tags to build out the cell matrix. In this XML format, any rows with empty cell values are skipped, the next cell with data positioned with an index attribute that acts like an offset in the columns, I assume to save space in the XML file.

like image 919
drewxmlss Avatar asked Aug 17 '11 07:08

drewxmlss


2 Answers

The format is called SpreadsheetML (do not confuse with .xlsx which is also xml-based), a library called Xelem can handle it:

import nl.fountain.xelem.excel.Workbook;
import nl.fountain.xelem.lex.ExcelReader;
//...
ExcelReader reader = new ExcelReader();
Workbook xlWorkbook = reader.getWorkbook("c:\\my\\spreadsheet.xml");
System.out.println(xlWorkbook.getSheetNames());
like image 167
Ilya Kharlamov Avatar answered Sep 28 '22 15:09

Ilya Kharlamov


Copying Mark Beardsley's answer from POI team http://apache-poi.1045710.n5.nabble.com/How-to-convert-xml-to-xls-td2306602.html :

You have got an Office 2003 xml file there, not an OpenXML file; it is an early attempt by Microsoft to create an xml based file format for Excel and it is in that sense a 'valid' Office file format.

Sadly, POI cannot interpret this file at all and that is why you saw the exception when you tried to wrap it up in the InputStream and pass it to WorkbookFactory(s) constructor. You do however have a number of options;

  • You could use Excel itself and manually open and save each file you wish to convert, as you already have done.
  • If you have access to Visual Studio and can write Visual Basic or C# code then you could use a control that will allow you to control Excel programmatically. This way you could automate a file conversion process using Excel itself. Then once the file has been converted wither to the binary or OpenXML formats, POI can be used to process it.
  • If you are running on a stand alone PC on which a copy of Excel is installed and using the Windows operating system, then you could use OLE to do something very similar from Java code. As above, POI can be used to process the file following the conversion.
  • If you have access to OpenOffice, it has a rather good API that is accessible from Java code. You could use it to convert between the file types for you - it is simply a matter of discovering the correct filter to use in this case. OpenOffice is good for all except the most complex files and you should be able to use POI to process the file following conversion. However, if you choose this route, it may be best to do all of the work using OpenOffice's UNO api.
  • Depending upon what you want to do with the file's contents, you could create your own parser using core java code and either the SAX or Xerces parsers (consider using xmlBeans (http://xmlbeans.apache.org/) ). If you simply open the original xml file using a simple text editor, you can see that the structure is not complex and, if all you wish to get at is the raw data it contains, this could be your best option.
like image 26
Andrey Avatar answered Sep 28 '22 16:09

Andrey