I would like to know if anyone has any advice on handling damaged files with Apache POI
I am trying to open a file and am receiving this message:
Exception in thread "main" org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x1C left 2 bytes remaining still to be read.     at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:156)     at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:231)     at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:480)     at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:301)     at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:95)     at ExcelImporter.EditFileImportDialog.main(EditFileImportDialog.java:409) Here is an SSCCE
import org.apache.poi.openxml4j.exceptions.InvalidFormatException; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.ss.usermodel.WorkbookFactory;  public class EditFileImportDialog {      /* Omitted irrelevent code */      public static void main(String[] args) {         File file = new File("Z:\\Path\\To\\File_causing_the_trouble.xls");         try {             Workbook wb = WorkbookFactory.create(file);  // Line 409 for ref to the exception stack trace             System.out.println(wb);         } catch (InvalidFormatException e) {             e.printStackTrace();         } catch (IOException e) {             e.printStackTrace();         }     } } This happens with this file only, and the exception is not thrown if I open the file in excel and save it, then try to open it with POI.  Any suggestion as to how I could handle this?
EDIT:
As a note, my issue may be related to this question, but upgrading POI did not fix my issue and there are dissimilarities with the described file.  I have searched around for similar answers but perhaps if someone knows what's wrong with the excel file itself, I can write something to patch the file.
EDIT 2
The file creation is not in my control. Excel fixes the file itself just upon opening and re-saving it. My question though is whether anyone can think of a way to adjust/augment POI to handle this damaged file in the same way that excel is able to fix the issue.
EDIT 3
In response to several comments/answers:
My end goal would be to not use excel at all.
Simply defined, a corrupted file is a damaged file. Whatever the cause (there are several), data in the file was fundamentally changed to the point it can no longer be used.
Often, a file conversion alone repairs a corrupt file. Use file repair software. If you're desperate to fix the file and recover your information, try a file repair utility. There are both free and paid tools, such as Hetman, Repair Toolbox, or FileRepair.
You could try using HSSFWorkbook to open .xls files.
You could use the following code to check how POI respond determining xls format.
private boolean isExcel(InputStream i) throws IOException { return (POIFSFileSystem.hasPOIFSHeader(i) || POIXMLDocument.hasOOXMLHeader(i)); } I would use :
InputStream input = new FileInputStream(fileName); Instead of :
File file = new File("Z:\\Path\\To\\File_causing_the_trouble.xls"); Did you check what is wrong with the cell 0x1C in your file ?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With